DistriNet / tranco-python-package

Python package to access the Tranco list
MIT License
21 stars 9 forks source link

changed the internal list to an ordered dictionary for faster rank lookup #8

Closed jconwell closed 2 years ago

jconwell commented 2 years ago

I swapped out the list obj in TrancoList for an OrderedDict. This can enum through every domain in the list and return it's rank in less than a second. I wanted to keep the API the same as before, so there is now a TrancoList.list property to return the top n domains as a list.

I added some tests just to make sure the new things I added work, but had to change Tranco.list() and add Tranco._get_list() so the tests had access to the raw domain list.

jconwell commented 2 years ago

FYI, I noticed that in python 3.7 the normal dict keeps the order items are added to it and is faster than the collections.OrderedDict that I used. If you want to make this library work for only py 3.7 or greater I can make that change.

That being said, the OrderedDict can iterate through the million domains in random order and pull their rank out pretty fast