first of all thank you for this great project - my colleagues and me love using StarSpace.
I recently pretrained a FastText model and then converted it to the tsv format (no first line and whitespace seperation between words and vectors). I wrote a script to add randomly initalized label vectors at the end of the tsv.
The model is loaded and the vocabulary and label size seem to be correct. I use the following to load the model:
sp = sw.starSpace(arg)
However I always end up with a segmentation fault:
Start to load a trained embedding model in tsv format.
Loading dict from model file : ../models/fast_text_medical_texts_labels.tsv
Number of words in dictionary: 347312
Number of labels in dictionary: 2923
Initialized model weights. Model size :
matrix : 2350235 500
Loading model from file ../models/fast_text_medical_texts_labels.tsv
Model loaded.
Training epoch 0: 0.001 3.33333e-06
Segmentation fault
What also really confuses me is the matrix size: The first dimension is way bigger than words+labels. Am I missing something here?
Another weird observation that I made is the following: When I specify a test file, Starspace loads the test instances, however it does not load the training instances from the training file I specified. When I do the training without the initFromTsv everything works as expected.
Hi,
first of all thank you for this great project - my colleagues and me love using StarSpace.
I recently pretrained a FastText model and then converted it to the tsv format (no first line and whitespace seperation between words and vectors). I wrote a script to add randomly initalized label vectors at the end of the tsv.
The model is loaded and the vocabulary and label size seem to be correct. I use the following to load the model: sp = sw.starSpace(arg)
sp.init()
However I always end up with a segmentation fault:
Start to load a trained embedding model in tsv format. Loading dict from model file : ../models/fast_text_medical_texts_labels.tsv Number of words in dictionary: 347312 Number of labels in dictionary: 2923 Initialized model weights. Model size : matrix : 2350235 500 Loading model from file ../models/fast_text_medical_texts_labels.tsv Model loaded. Training epoch 0: 0.001 3.33333e-06 Segmentation fault
What also really confuses me is the matrix size: The first dimension is way bigger than words+labels. Am I missing something here?
Another weird observation that I made is the following: When I specify a test file, Starspace loads the test instances, however it does not load the training instances from the training file I specified. When I do the training without the initFromTsv everything works as expected.
Thanks!
Best, Sven