Open strubell opened 6 years ago
The validation is done by run.sh
which is evaluated every 300 seconds by default. Make sure run.sh
can work properly.
Oops, yes, that makes sense. Is there any specific form that run.sh
needs to take in order for the best model to be saved (like writing the score somewhere)? If you could provide an example that would be ideal.
Normally, there would be a file named log
which records the F1 score of each validation. For example:
model.ckpt-2246: 24.330000
model.ckpt-4470: 49.570000
model.ckpt-6699: 59.640000
model.ckpt-8919: 65.980000
If this file is empty, it may indicate that the run.sh
have failed to execute. For example, if you run out of GPU memory, then run.sh
may fail.
The readme says to point to the
train/best
directory for decoding a trained model, but the code doesn't seem to be saving any models to that directory. It is saving models to thetrain
directory, which I can successfully evaluate. How can I configure training to save the best model during training?Here is the command I am running to train: