This enables StarSpace to load from compressed file in ".gz" format, using 3rd party zlib and gzip library.
It speeds up reading from file on large input files. However, it introduced additional library dependencies (zlib, gzip).
Currently it assumes that one partitions the input file into smaller parts and the total amount of parts equals to number of threads used, in order to allow reading in parallel.
To use the compressed file format, first compile StarSpace using the makefile_compress:
make -f makefile_compress
then in your train config, specify
-trainFile input -compressFile gzip -numGzFile 10
and it tries to read from gzip file input00.gz, input01.gz, ..., input09.gz
This enables StarSpace to load from compressed file in ".gz" format, using 3rd party zlib and gzip library. It speeds up reading from file on large input files. However, it introduced additional library dependencies (zlib, gzip). Currently it assumes that one partitions the input file into smaller parts and the total amount of parts equals to number of threads used, in order to allow reading in parallel.
To use the compressed file format, first compile StarSpace using the makefile_compress:
make -f makefile_compress
then in your train config, specify
-trainFile input -compressFile gzip -numGzFile 10
and it tries to read from gzip fileinput00.gz, input01.gz, ..., input09.gz