Warm start in xlearn - Githubissues

mohamed-ali commented 5 years ago

Is it possible to run xlearn model with the results of a previous training iteration, i.e intialize the model with parameters of last training results?

aksnzhy commented 5 years ago

@mohamed-ali I think it is not very hard to add this feature to xLearn, and I can do this these days.

mohamed-ali commented 5 years ago

@aksnzhy I think having warm-start can enable incremental learning, which might be better than the ondisk training idea. In fact, it seems that if one has a data set for instance of 50GB, and trains the model by chunks of 10Gb or so (with warm-start) it would be faster than training with the whole dataset of 50GB at once.

For example, I tried training a model on a machine that has 32GB of memory with ondisk option, with two datasets:

a dataset with 6GB => training finished in 25min
a dataset with 50GB => training didn't make it even to the first iteration after 15h past.

It might be a problem in my code, which I added below for reference, but it seems the ondisk approach is very slow and that using warm start might be faster.

import xlearn as xl

params = {
    "task": "binary",
    "k": 4,
    "lr": 0.01,
    "lambda": 0.0002,
    "metric": "acc",
    "epoch": 40,
    "opt": "adagrad",
    "block_size": 5000
}

TRAIN_DATA = "train.ffm"
VAL_DATA = "val.ffm"

ffm_model = xl.create_ffm()
ffm_model.setOnDisk()

ffm_model.setTrain(TRAIN_DATA)

ffm_model.setValidate(VAL_DATA)
ffm_model.fit(param, "model.out")

aksnzhy commented 5 years ago

@mohamed-ali Hi, I add the online learning feature to xLearn. You can have try:

xLearn can supoort online learning, which can train new data based on the pre-trained model. User can use the setPreModel API to specify the file path of pre-trained model. For example: ::

   import xlearn as xl

   ffm_model = xl.create_ffm()
   ffm_model.setTrain("./small_train.txt")
   ffm_model.setValidate("./small_test.txt")  
   ffm_model.setPreModel("./pre_model")
   param = {'task':'binary', 'lr':0.2, 'lambda':0.002} 

   ffm_model.fit(param, "./model.out")

Note that, xLearn can only uses the binary model, not the TXT model.

aksnzhy commented 5 years ago

@mohamed-ali Please clone the latest 0.4.0 version code and re-build it.

Thank you!

aksnzhy / xlearn

Warm start in xlearn #175