perf: Asynchronously dispatch requests in groups

ReVanced / restore-missing-youtube-watch-history

⌛ Script to import missing YouTube watch history

https://revanced.app

GNU General Public License v3.0

59 stars 6 forks source link

perf: Asynchronously dispatch requests in groups #10

Open alexandreteles opened 6 months ago

alexandreteles commented 6 months ago

This small rewrite uses async to dispatch requests in groups of five with a small delay of sleep: float = random.uniform(1, 3) on each dispatch. This should result in faster execution than dispatching requests in a synchronous way while introducing some entropy to not scare YouTube too much.

I cannot test it myself, so I would be glad if you could check it out @oSumAtrIX.

Thank you!

EDIT: it also introduces a retry option that tries to execute the mark_watched operation three times before giving up on that specific video. I did not introduce a global failure count, but this should be trivial if the current code works.

indrastorms commented 6 months ago

File "/data/data/com.termux/files/home/restore-missing-youtube-watch-history/main.py", line 106, in main
    kept: list[dict[str, Any]] = await filter_video_events(data, RESUME_TIMESTAMP)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object async_generator can't be used in 'await' expression

alexandreteles commented 6 months ago

File "/data/data/com.termux/files/home/restore-missing-youtube-watch-history/main.py", line 106, in main
    kept: list[dict[str, Any]] = await filter_video_events(data, RESUME_TIMESTAMP)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object async_generator can't be used in 'await' expression

Fixed the issue, that is what I get for writing code without testing. Anyway, against my better judgment I have tested the script using my own account. The new execution logic should also pull new videos to process as soon as more space is available in the semaphore instead of waiting for the whole batch to finish. Every video will still have a random asyncio.sleep() to introduce some entropy. Default concurrency is still five requests at the same time, but that can be controlled with --concurrency.

I've also added a check to not process the same video multiple times by checking the video URL against a log file.

Would you be kind enough to test it again?

indrastorms commented 6 months ago

It's working fine, thanks to your async contribution its super fast now.

alexandreteles commented 6 months ago

It's working fine, thanks to your async contribution its super fast now.

@oSumAtrIX Can you PR a fix to the readme that includes these changes? I will be a bit busy today so I'm not sure I'll be able to write it.

oSumAtrIX commented 6 months ago

@alexandreteles What changes to the readme are necessary?

alexandreteles commented 6 months ago

Some of the command line arguments are gone and we have a new one called concurrency that allows you to tell how many connections the app will do at the same time. That's about it.

Mr-HaleYa commented 6 months ago

Tqdm needs to be installed to run. Should this be in the requirements file?

alexandreteles commented 6 months ago

Tqdm needs to be installed to run. Should this be in the requirements file?

That's in a different PR 😅