Closed SamPersoon closed 5 years ago
@SamPersoon This is a really good question. I can split my answer into three parts. In first section I describe how python wrapper
is different from Starspace CLI
. In second section I describe how to use python wrapper. Finally I can add some tips specific to your problem.
I have described it here: https://github.com/facebookresearch/StarSpace/issues/223#issuecomment-466758789
Starspace CLI
is a set of utility functions built upon Starspace class. This abstracts multiple functions to be invoked on Starspace class to achieve those functionalities that CLI provide. These abstractions implements a sequence of functions in required order (order matters).
But Python wrapper
is written to wrap Starspace class instead of those high-level utility functions (helpful in writing more flexible python apps). So, whoever using this wrapper should be careful to respect Starspace class interfaces.
Below steps will help in writing python apps with Starspace. Step 1: Set required arguments step 2: Call any of init() methods to run initial operations based on your arguments. Init() methods include, init(), initFromSavedModel(model), initFromTsv(model_tsv) step 3: train your model or make predictions from pre-trained model.
...for evaluating my models. I would like to retrieve predictions but I get...segmentation fault (core dumped)
Starspace class doesn't allow you to evaluate()
unless the model is loaded from disk. Even if you set arg.testFile
the file is not loaded. See, init()
loads only trainFile
and validationFile
https://github.com/facebookresearch/StarSpace/blob/de71fb61bbc7871a98bde69828e34c794fa8b800/src/starspace.cpp#L71, testFile
along with trainFile
and validationFile
is loaded inside initDataHandler
https://github.com/facebookresearch/StarSpace/blob/de71fb61bbc7871a98bde69828e34c794fa8b800/src/starspace.cpp#L41 which is called from functions initFromSavedModel
and initFromTsv
.
This means, you can train your model by providing arg.trainFile
and validate it by providing arg.validationFile
during training. To test
your model (get Evaluation Metrics), you should first set arg.testFile
, save trained model to disk, load it and then evaluate()
. Honestly, I don't know the reason why in memory evaluation is not allowed 🤔, may be @ledw could explain that.
I also noticed, calling train before init, gives segmentation faults as well.
As mentioned, order matters.. 😊
I do not know what goes wrong with this code
I have modified the code:
args = starwrap.args()
args.trainMode = 0
args.verbose = 20
args.trainFile = trainfile
args.testFile = testfile
arg.validationFile = validationfile # modified, show validation error during training
# args.predictionFile = '../tmp/preds.preds'
model = starwrap.starSpace(args)
model.init()
model.train()
sp.saveModel(modelfilename) # modified, save to disk
sp.initFromSavedModel(modelfilename) # modified, load from disk
model.evaluate() # show Evaluation Metrics
Using StarSpace without the wrapper works, but models tend to be bigger than 1 GB, so I prefer the wrapper, so I do not have to write and read the models from and to disk.
You can do a trick (It worked for me in basic setting, can't assure as a good solution, may be @ledw could confirm). Modify src/starspace.cpp as below:
initDataHandler();
to the end of void StarSpace::init() {
(just before closing }
)./build.sh
).args = starwrap.args()
args.trainMode = 0
args.verbose = 20
args.trainFile = trainfile
args.testFile = testfile
arg.validationFile = validationfile # modified, show validation error during training
# args.predictionFile = '../tmp/preds.preds'
model = starwrap.starSpace(args)
model.init()
model.train()
# sp.saveModel(modelfilename) # modified, save to disk
# sp.initFromSavedModel(modelfilename) # modified, load from disk
model.evaluate() # show Evaluation Metrics
thanks.
@freakeinstein Thank you for your wonderful package and solving this confusion. You make my day!
The python wrapper works well, except for evaluating my models. I would like to retrieve predictions but I get this output:
[1] 11116 segmentation fault (core dumped)
Using StarSpace without the wrapper works, but models tend to be bigger than 1 GB, so I prefer the wrapper, so I do not have to write and read the models from and to disk.
I also noticed, calling train before init, gives segmentation faults as well. Any solutions to this or more logging maybe, because I do not know what goes wrong with this code: