Open DominikStiller opened 5 days ago
Hi Dominik, thanks for pointing this out! I'm sure the performance of the codebase is not optimal and lots of improvements could be done. I developed most of the infrastructure from scratch in a relatively short time period, and the runtime speed was usually not the priority as long as it's not significantly slowing down the major tasks. Now that this project is open to the community, you are mostly welcome to implement those improvements and submit PRs. Once there is sufficient upgrade of the codebase in terms of features and performance, we may submit another manuscript to report the updates.
Thanks for clarifying, I'll submit a PR for this issue!
Operations on a ProxyDatabase that loop over all records slow down/do not have constant iteration time (as measured by tqdm's it/s), which is especially noticeable for large databases. Take for example slicing: https://github.com/fzhu2e/cfr/blob/a99c13bfeae4643b64c63be339894b48abd7f5ef/cfr/proxy.py#L1730-L1746
The operation
new += spobj
calls.refresh()
every time, which itself iterates over all proxies. Therefore, it is O(n^2), while it could easily be O(n) if they were added to a dictionary first, then converted to a ProxyDatabase at the end. Is there a reason this is not being done?