Forum Loader Optimization

ajskateboarder commented 1 year ago

It takes like 30 seconds to load one page of a topic

Maybe we should move from file-based archiving to something based in a DB (sqlite ~~or supabase depending on how my current pr goes~~)

redstone-dev commented 1 year ago

I'd just like to say that a lot of that slowdown only came after I moved the pfp loading from the client to the server for some reason :P

I'll make a commit (basically) reverting that

NotFenixio commented 1 year ago

It's probably because the posts are loaded synchronously, so you have to wait about 1-3 seconds for a post to load, and keeping in mind that there are about 10 posts per topic...

We need to make it asynchronous.

(note: i haven't worked with anything asynchronous in my life, but it is theoretically "faster".)

redstone-dev commented 1 year ago

We need to make it asynchronous.

I totally forgot asyncio existed. 🤦🏻 I don't know much about it so someone else can do it lmao

NotFenixio commented 1 year ago

I don't know much about it so someone else can do it lmao

Who says a human has to do it? :trollface:

Jokes aside, I think making something asynchronous means we need to make the whole project asynchronous.

redstone-dev commented 1 year ago

That's actually not true. If I remember correctly, the way asyncio works, you can have some parts of your project async and the other synchronous.

ajskateboarder commented 1 year ago

It's probably because the posts are loaded synchronously, so you have to wait about 1-3 seconds for a post to load, and keeping in mind that there are about 10 posts per topic...

We need to make it asynchronous.

I think the issue comes from having to write every single API response to a cache directory (just in case scratchdb goes down), even if the same response already exists. Asynchronous code doesn't make I/O work faster.

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though

redstone-dev commented 1 year ago

I think the issue comes from having to write every single API response to a cache directory

I'm going to test this, actually, by removing the @archive_result() decorator from the get_topic_posts function.

NotFenixio commented 1 year ago

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though

Yep, it's probably more reliable. ~Do I start making a parser, or should we wait until most of the site works?~ nvm, let's get the site working first

Asynchronous code doesn't make I/O work faster.

The current code loads posts so that the server has to wait for the previous posts to load before loading the next one. We could use asynchronous code to load multiple posts at once.

redstone-dev commented 1 year ago

I found that if you run flask run --with-threads, Snazzle runs a lot faster. We still do need to add asynchronous post loading, though.

EDIT:

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though

I will try working on this

redstone-dev commented 1 year ago

We could also try using Cython, which will compile Python to C which is much faster

ajskateboarder commented 1 year ago

We could also try using Cython, which will compile Python to C which is much faster

What exactly would we use Cython for?

redstone-dev commented 1 year ago

We could also try using Cython, which will compile Python to C which is much faster

What exactly would we use Cython for?

After thinking about it, I think we'd need to convert all of Flask to use Cython, so it's probably better to optimize our existing code.

My initial thought was that our code would be converted to C and compiled so it would be faster. Correct me if I'm wrong, but I think this would be harder to develop for, because in order to make our code run in C, we have to do that explicitly and that requires special syntax, and most people that would contribute to Snazzle probably don't know this special syntax, therefore making it harder to develop for.

Also, somehow I at first confused the capabilities of Cython with those of PyPy.

Finally we could also add mypy for type checking which would make our code more type-safe.

ajskateboarder commented 1 year ago

My initial thought was that our code would be converted to C and compiled so it would be faster.

Cython does not make code faster in all cases. It's typically used more for heavy math/statistics computing (such as numpy and pandas)

Correct me if I'm wrong, but I think this would be harder to develop for, because in order to make our code run in C, we have to do that explicitly and that requires special syntax, and most people that would contribute to Snazzle probably don't know this special syntax, therefore making it harder to develop for.

That, and also that you would need to install a C compiler, which would be Visual Studio on Windows :sob:

redstone-dev commented 1 year ago

I'm going to use multiprocessing for this purpose.

ajskateboarder commented 1 year ago

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data.

RSS only contains the most recent posts, so we can't show all posts from it unfortunately. I don't know what else to use if we want this to be reliable

redstone-dev commented 1 year ago

I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data.

RSS only contains the most recent posts, so we can't show all posts from it unfortunately. I don't know what else to use if we want this to be reliable

We could get data from ScratchDB and then use RSS to top it up with data that ScratchDB hasn't indexed yet. If there's a ScratchDB outage we'll display an alert to the user that all older posts won't be visible until ScratchDB comes back online.

redstone-dev commented 11 months ago

This is basically a non-issue with the new Svelte port. However, before we discontinue the legacy codebase I think it would be worthwile to refine it a bit.

dynamixbot commented 7 months ago

What if the pages are loaded at the same time but the posts in those pages are loaded one-by one. This would mean that once 1 page is loaded, every other page is loaded too, not requiring any more processing. By setting the post count per page to 20, we just need to load 20 posts at the same time with others. So if a thread has 20 pages, then first it would load the first post of every page, then the second, the third and so on. We can do this by loading the posts by their ones digit. so we start from 1, which loads the 1 from every thread. then 2, then 3 and so on until 0 (0 comes at last because each page ends with 0 in the ones digit) . Or we can just ditch this and just try to make loading parallel instead of serial (which is my approach).

dynamixbot commented 5 months ago

bump

redstone-dev commented 3 months ago

With the release of Snazzle Production Server, bjoern should speed up page loading, but the main bottleneck (when ScratchDB still worked) was getting post data from it. It seems that we just need to make as little HTTP requests as possible to make Snazzle more performant.

dynamixbot commented 3 months ago

With the release of Snazzle Production Server, bjoern should speed up page loading, but the main bottleneck (when ScratchDB still worked) was getting post data from it. It seems that we just need to make as little HTTP requests as possible to make Snazzle more performant.

It probably is faster (can't install snazzle 😭) Also btw what do you think about my forum loader structure idea?

SnarpleDev / Snazzle

Forum Loader Optimization #45