aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.08k stars 518 forks source link

Warm start in xlearn #175

Open mohamed-ali opened 5 years ago

mohamed-ali commented 5 years ago

Is it possible to run xlearn model with the results of a previous training iteration, i.e intialize the model with parameters of last training results?

aksnzhy commented 5 years ago

@mohamed-ali I think it is not very hard to add this feature to xLearn, and I can do this these days.

mohamed-ali commented 5 years ago

@aksnzhy I think having warm-start can enable incremental learning, which might be better than the ondisk training idea. In fact, it seems that if one has a data set for instance of 50GB, and trains the model by chunks of 10Gb or so (with warm-start) it would be faster than training with the whole dataset of 50GB at once.

For example, I tried training a model on a machine that has 32GB of memory with ondisk option, with two datasets:

It might be a problem in my code, which I added below for reference, but it seems the ondisk approach is very slow and that using warm start might be faster.

import xlearn as xl

params = {
    "task": "binary",
    "k": 4,
    "lr": 0.01,
    "lambda": 0.0002,
    "metric": "acc",
    "epoch": 40,
    "opt": "adagrad",
    "block_size": 5000
}

TRAIN_DATA = "train.ffm"
VAL_DATA = "val.ffm"

ffm_model = xl.create_ffm()
ffm_model.setOnDisk()

ffm_model.setTrain(TRAIN_DATA)

ffm_model.setValidate(VAL_DATA)
ffm_model.fit(param, "model.out")
aksnzhy commented 5 years ago

@mohamed-ali Hi, I add the online learning feature to xLearn. You can have try:

xLearn can supoort online learning, which can train new data based on the pre-trained model. User can use the setPreModel API to specify the file path of pre-trained model. For example: ::

   import xlearn as xl

   ffm_model = xl.create_ffm()
   ffm_model.setTrain("./small_train.txt")
   ffm_model.setValidate("./small_test.txt")  
   ffm_model.setPreModel("./pre_model")
   param = {'task':'binary', 'lr':0.2, 'lambda':0.002} 

   ffm_model.fit(param, "./model.out") 

Note that, xLearn can only uses the binary model, not the TXT model.

aksnzhy commented 5 years ago

@mohamed-ali Please clone the latest 0.4.0 version code and re-build it.

Thank you!