Open mohamed-ali opened 5 years ago
@mohamed-ali I think it is not very hard to add this feature to xLearn, and I can do this these days.
@aksnzhy I think having warm-start can enable incremental learning, which might be better than the ondisk
training idea. In fact, it seems that if one has a data set for instance of 50GB, and trains the model by chunks of 10Gb or so (with warm-start) it would be faster than training with the whole dataset of 50GB at once.
For example, I tried training a model on a machine that has 32GB of memory with ondisk option, with two datasets:
It might be a problem in my code, which I added below for reference, but it seems the ondisk approach is very slow and that using warm start might be faster.
import xlearn as xl
params = {
"task": "binary",
"k": 4,
"lr": 0.01,
"lambda": 0.0002,
"metric": "acc",
"epoch": 40,
"opt": "adagrad",
"block_size": 5000
}
TRAIN_DATA = "train.ffm"
VAL_DATA = "val.ffm"
ffm_model = xl.create_ffm()
ffm_model.setOnDisk()
ffm_model.setTrain(TRAIN_DATA)
ffm_model.setValidate(VAL_DATA)
ffm_model.fit(param, "model.out")
@mohamed-ali Hi, I add the online learning feature to xLearn. You can have try:
xLearn can supoort online learning, which can train new data based on the pre-trained model. User can use the setPreModel
API to specify the file path of pre-trained model. For example: ::
import xlearn as xl
ffm_model = xl.create_ffm()
ffm_model.setTrain("./small_train.txt")
ffm_model.setValidate("./small_test.txt")
ffm_model.setPreModel("./pre_model")
param = {'task':'binary', 'lr':0.2, 'lambda':0.002}
ffm_model.fit(param, "./model.out")
Note that, xLearn can only uses the binary model, not the TXT model.
@mohamed-ali Please clone the latest 0.4.0 version code and re-build it.
Thank you!
Is it possible to run xlearn model with the results of a previous training iteration, i.e intialize the model with parameters of last training results?