WikiTeam / wikiteam

Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2023, WikiTeam has preserved more than 350,000 wikis.
https://github.com/WikiTeam
GNU General Public License v3.0
705 stars 147 forks source link

Speed up file scanning in `images/` dir #453

Closed yzqzss closed 1 year ago

yzqzss commented 1 year ago

Use set instead of list to speed up the scanning of large numbers of files (>10000) in images/.


Benchmark: (one million files in images/ dir)

Set: one million files/s
List: 40 files/s
nemobis commented 1 year ago

Makes sense. Maybe sorting used to make sense when we just looked up the last downloaded image, but as we're checking them all for existence in the next step the sorted order is never used. Thanks!