AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
427 stars 23 forks source link

[Feature Request] Support gzipped files? #47

Closed diegoceccarelli closed 1 year ago

diegoceccarelli commented 1 year ago

Is your feature request related to a problem? Please describe. trec files can be several megabytes, for example run_*.trec used for the examples are all more than 20Mb, but once compressed they become less than 10. That would make downloads faster and also loading the file in memory.

Describe the solution you'd like support *.trec.gz.

Describe alternatives you've considered It would cool to evaluate also alternative formats to store the trec file, like [parquet] (https://arrow.apache.org/docs/python/parquet.html), this library focus on computing metrics fast, but if you spend ages to load/parse the trec file it is not very useful - parquet is much faster to load in memory and it is supports compression natively.

Additional context 💨

AmenRa commented 1 year ago

Both the suggestions sound good to me.

You can already save imported runs as highly compressed as lz4 files with run.save("save/path/run.lz4"). Under the hood they are JSON files. I did several tests a few months ago and they should be smaller and faster to extract than gzipped TREC files.

I'll let you know when the request features are available. 🍻

AmenRa commented 1 year ago

Added support for gzipped TREC files in v0.3.15.

AmenRa commented 1 year ago

Added support for Parquet files in v0.3.16.