Open lukashaenjes opened 3 years ago
I never did this but I think you can just do saveEveryEpoch = TRUE
And next time you want to train again you need to load the model and get the embeddings
x <- starspace_load_model("wordspace.bin.tsv", method = "tsv-data.table")
embeddings <- as.matrix(x)
and next pass on the embeddings to embed_wordspace(..., embeddings = embeddings)
or directly to starspace starspace(..., embeddings = embeddings)
Transfer learning is shown in section 5 of the package vignette: https://cran.r-project.org/web/packages/ruimtehol/vignettes/ground-control-to-ruimtehol.pdf
Thanks a lot for your fast response! I'll give this a try.
Hi, first of all, many thanks for this outstanding package.
I have a question concerning model checkpointing: I have a fairly large corpus (~ 70M words) and run a model which calculates word embeddings (with
embed_wordspace
) with 10 epochs. I run this on a remote server and it can take up to 2 days for all 10 epochs to finish.As a fault tolerance measure, I figured it might be a good idea to checkpoint the model after every epoch so in case something crashes, I can load the last saved epoch and continue training from there. For this, I set
saveEveryEpoch = TRUE
. Since I only want to save the last successful epoch, I keepsaveTempModel = FALSE
.My question now is: How can I continue training from this checkpoint after something went wrong? I tried to pass
initModel = "wordspace.bin"
in the existingembed_wordspace
call, which gives:But, then it continues to run the model with the parameters specified in the overall call to
embed_wordspace
, starting at epoch 1 and seemingly ignoring the passed model. Also, when reading in the intermediatewordspace.bin.tsv
, I'm left with the default parameters, not the one I passed in the function. For instance,x$args$param$epoch
gives5
(the default), while I originally passedepoch = 10
:Could this be the cause of the problem?
Am I approaching this correctly? What would be an alternative way to achieve my desired goal? I'm thinking of something similar to the ModelCheckpoint functionality in TensorFlow.
Many thanks in advance!