bitdruid / python-wayback-machine-downloader

Query and download archive.org as simple as possible.
MIT License
33 stars 2 forks source link

Some of the improvements I've made to this for own use but I like to share it #10

Closed Ghost-chu closed 5 months ago

Ghost-chu commented 5 months ago

This PR should definitely not be merged straight away (missing tests, coding style, and some random debugging code), but I hope this PR will help improve this

The main improvements are as follows:

bitdruid commented 5 months ago

please review my implementations as i wanted to do less deep changes in the code-logic

Ghost-chu commented 5 months ago

please review my implementations as i wanted to do less deep changes in the code-logic

Why not process the skipset before starting the workers to start processing snapshots? Processing skipset in download will generate a lot of method calls and a lot of v.writes. Python's text output performance is not good enough.

Considering that in my patch, processing it earlier resulted in a time savings of over 30 minutes, it seems well worth it.

bitdruid commented 5 months ago

please review my implementations as i wanted to do less deep changes in the code-logic

Why not process the skipset before starting the workers to start processing snapshots? Processing skipset in download will generate a lot of method calls and a lot of v.writes. Python's text output performance is not good enough.

Considering that in my patch, processing it earlier resulted in a time savings of over 30 minutes, it seems well worth it.

forgot about that one sorry. you are right

Ghost-chu commented 5 months ago

please review my implementations as i wanted to do less deep changes in the code-logic

Why not process the skipset before starting the workers to start processing snapshots? Processing skipset in download will generate a lot of method calls and a lot of v.writes. Python's text output performance is not good enough. Considering that in my patch, processing it earlier resulted in a time savings of over 30 minutes, it seems well worth it.

forgot about that one sorry. you are right

please review my implementations as i wanted to do less deep changes in the code-logic

Why not process the skipset before starting the workers to start processing snapshots? Processing skipset in download will generate a lot of method calls and a lot of v.writes. Python's text output performance is not good enough. Considering that in my patch, processing it earlier resulted in a time savings of over 30 minutes, it seems well worth it.

forgot about that one sorry. you are right

hi A friendly reminder - don't forget to fast forward the progress bar! It will look weird. Verbosity's wrtie function only allowed me to increase by 1, so I changed it to save a lot of function calls (by fast-forwarding all at once).

Based on my testing, it performs terribly if the progress bar is updated in a loop.