bnosac / ruimtehol

R package to Embed All the Things! using StarSpace
Mozilla Public License 2.0
99 stars 13 forks source link

unable to train wikipedia_shuf_train5M.txt #26

Closed ashishlal closed 4 years ago

ashishlal commented 4 years ago

Here's my code and the error. Please help me resolve this issue. I have tried with trainMode=2, 3 and 5

library(ruimtehol)
set.seed(123456789)

setwd("D:/Software/StarSpace/scripts")

model <- starspace(file = "../data/wikipedia_shuf_train5M.txt",  fileFormat = "labelDoc", dim = 512, 
                                trainMode = 3, epoch=20)

Start to initialize starspace model.
Build dict from input file : ../data/wikipedia_shuf_train5M.txt
Read 2099M words
Number of words in dictionary:  10410937
Number of labels in dictionary: 0
Loading data from file : ../data/wikipedia_shuf_train5M.txt
Total number of examples loaded : 0
ERROR: File '../data/wikipedia_shuf_train5M.txt' does not contain any valid example.
Please check: is the file empty? Do the examples contain proper feature and label according to the trainMode? If your examples are unlabeled, try to set trainMode=5.
Error in (function (model = "textspace.bin", save = FALSE, trainFile = "",  : 
  Incorrect Starspace usage
jwijffels commented 4 years ago

In trainMode 3 as per the documentation of Starspace

trainMode = 3:

Each example contains a collection of labels. At training time, two labels from the collection are randomly picked as the LHS and RHS.
Use case: learn pairwise similarity from collections of similar objects, e.g. sentence similiarity.

The error message saysDo the examples contain proper feature and label according to the so does your data have correct examples according to trainMode 3. See the source code of embed_sentencespace to see how such examples could look like. If you want more details on the trainMode's, the best is to look at the Starspace docs at https://github.com/facebookresearch/StarSpace

jwijffels commented 4 years ago

Closing. Feel free to reopen if needed.