Open ajskateboarder opened 1 year ago
I'd just like to say that a lot of that slowdown only came after I moved the pfp loading from the client to the server for some reason :P
I'll make a commit (basically) reverting that
It's probably because the posts are loaded synchronously, so you have to wait about 1-3 seconds for a post to load, and keeping in mind that there are about 10 posts per topic...
We need to make it asynchronous.
(note: i haven't worked with anything asynchronous in my life, but it is theoretically "faster".)
We need to make it asynchronous.
I totally forgot asyncio
existed. 🤦🏻
I don't know much about it so someone else can do it lmao
I don't know much about it so someone else can do it lmao
Who says a human has to do it? :trollface:
Jokes aside, I think making something asynchronous means we need to make the whole project asynchronous.
That's actually not true. If I remember correctly, the way asyncio works, you can have some parts of your project async and the other synchronous.
It's probably because the posts are loaded synchronously, so you have to wait about 1-3 seconds for a post to load, and keeping in mind that there are about 10 posts per topic...
We need to make it asynchronous.
I think the issue comes from having to write every single API response to a cache directory (just in case scratchdb goes down), even if the same response already exists. Asynchronous code doesn't make I/O work faster.
I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though
I think the issue comes from having to write every single API response to a cache directory
I'm going to test this, actually, by removing the @archive_result()
decorator from the get_topic_posts
function.
I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though
Yep, it's probably more reliable. ~Do I start making a parser, or should we wait until most of the site works?~ nvm, let's get the site working first
Asynchronous code doesn't make I/O work faster.
The current code loads posts so that the server has to wait for the previous posts to load before loading the next one. We could use asynchronous code to load multiple posts at once.
I found that if you run flask run --with-threads
, Snazzle runs a lot faster. We still do need to add asynchronous post loading, though.
EDIT:
I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data. We should use lru_cache for it though
I will try working on this
We could also try using Cython, which will compile Python to C which is much faster
We could also try using Cython, which will compile Python to C which is much faster
What exactly would we use Cython for?
We could also try using Cython, which will compile Python to C which is much faster
What exactly would we use Cython for?
After thinking about it, I think we'd need to convert all of Flask to use Cython, so it's probably better to optimize our existing code.
My initial thought was that our code would be converted to C and compiled so it would be faster. Correct me if I'm wrong, but I think this would be harder to develop for, because in order to make our code run in C, we have to do that explicitly and that requires special syntax, and most people that would contribute to Snazzle probably don't know this special syntax, therefore making it harder to develop for.
Also, somehow I at first confused the capabilities of Cython with those of PyPy.
Finally we could also add mypy for type checking which would make our code more type-safe.
My initial thought was that our code would be converted to C and compiled so it would be faster.
Cython does not make code faster in all cases. It's typically used more for heavy math/statistics computing (such as numpy and pandas)
Correct me if I'm wrong, but I think this would be harder to develop for, because in order to make our code run in C, we have to do that explicitly and that requires special syntax, and most people that would contribute to Snazzle probably don't know this special syntax, therefore making it harder to develop for.
That, and also that you would need to install a C compiler, which would be Visual Studio on Windows :sob:
I'm going to use multiprocessing
for this purpose.
I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data.
RSS only contains the most recent posts, so we can't show all posts from it unfortunately. I don't know what else to use if we want this to be reliable
I think we could move away from ScratchDB to Scratch's RSS data (the button that looks like a signal). It's more reliable and has newer data.
RSS only contains the most recent posts, so we can't show all posts from it unfortunately. I don't know what else to use if we want this to be reliable
We could get data from ScratchDB and then use RSS to top it up with data that ScratchDB hasn't indexed yet. If there's a ScratchDB outage we'll display an alert to the user that all older posts won't be visible until ScratchDB comes back online.
This is basically a non-issue with the new Svelte port. However, before we discontinue the legacy codebase I think it would be worthwile to refine it a bit.
What if the pages are loaded at the same time but the posts in those pages are loaded one-by one. This would mean that once 1 page is loaded, every other page is loaded too, not requiring any more processing. By setting the post count per page to 20, we just need to load 20 posts at the same time with others. So if a thread has 20 pages, then first it would load the first post of every page, then the second, the third and so on. We can do this by loading the posts by their ones digit. so we start from 1, which loads the 1 from every thread. then 2, then 3 and so on until 0 (0 comes at last because each page ends with 0 in the ones digit) . Or we can just ditch this and just try to make loading parallel instead of serial (which is my approach).
bump
With the release of Snazzle Production Server, bjoern should speed up page loading, but the main bottleneck (when ScratchDB still worked) was getting post data from it. It seems that we just need to make as little HTTP requests as possible to make Snazzle more performant.
With the release of Snazzle Production Server, bjoern should speed up page loading, but the main bottleneck (when ScratchDB still worked) was getting post data from it. It seems that we just need to make as little HTTP requests as possible to make Snazzle more performant.
It probably is faster (can't install snazzle 😭) Also btw what do you think about my forum loader structure idea?
It takes like 30 seconds to load one page of a topic
Maybe we should move from file-based archiving to something based in a DB (sqlite
or supabase depending on how my current pr goes)