facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.94k stars 531 forks source link

Support reading from compressed file #206

Closed ledw closed 5 years ago

ledw commented 5 years ago

This enables StarSpace to load from compressed file in ".gz" format, using 3rd party zlib and gzip library. It speeds up reading from file on large input files. However, it introduced additional library dependencies (zlib, gzip). Currently it assumes that one partitions the input file into smaller parts and the total amount of parts equals to number of threads used, in order to allow reading in parallel.

To use the compressed file format, first compile StarSpace using the makefile_compress: make -f makefile_compress

then in your train config, specify -trainFile input -compressFile gzip -numGzFile 10 and it tries to read from gzip file input00.gz, input01.gz, ..., input09.gz