aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

Binary learning with --disk learning #287

Open 4ound opened 5 years ago

4ound commented 5 years ago

Hey! We have big dataset for training (few TB), so we use your library with --disk learning option. We spend significant time on dataset parsing. On practice, if generate binary file and start RAM learning (with --no-bin and without), second variant would be faster. While on disk mode doesn't use/generate binary file, why?

aksnzhy commented 5 years ago

@4ound The binary file could beyond you RAM capability, so we only generate binary file for in-memory training.