Closed bob-rietveld closed 5 years ago
I also saw that appearing when testing out (using another trainMode) the models & the predict functionality and I asked myself the same question. Still reading the paper thoroughly to understand why this is.
Can you share your flow / code?
Hi,
Below is my (very simple)code on evaluating a sentiment model. I can send you the data files if you like.
I upgraded to the latest version but I see that the evaluation is under construction (in the model_eval object), do I need to have special params to evaluate the model, or just be patient -;).
StarSpace now provides hit1 etc eval metrics but it would be nice if I were able to compute my own evaluation metrics (perhaps using something like yardstick https://tidymodels.github.io/yardstick/index.html) , is that possible do you think?
Thanks again for the nice work on this package.
# train star space model see https://github.com/facebookresearch/StarSpace for options
model <- starspace( file = "data/sentiment_train.txt",
trainMode = 0,
epoch = 15
)
# save model
starspace_save_model(model,
file = "model/aspect_embeddings.tsv")
# inspect model
embeddings <- data.table::fread("model/aspect_embeddings.tsv")
## evaluate model
model_eval <- starspace( file = "data/sentiment_test.txt",
model = "textspace.bin",
trainMode = 0)
For evaluation, there is still some work. but you can use ruimtehol:::textspace(model$model, testFile = "path/to/testfile", OTHER ARGS)
to get the starspace evaluation metrics while passing all the detailed Starspace arguments.
If you are just using it for classification, there are so many other R packages which can calculate evaluation metrics of classification models.
The only thing that I'm planning to add to this R package are a shorthand for the ruimtehol:::textspace(model$model, testFile = "path/to/testfile", OTHER ARGS)
type of call
By the way, if you want the embeddings, you can just do as.matrix(model)
Hi @good-marketing Probably the reason why you got duplicates is that if you build a model with trainmode 0 (e.g. classification) and you did not set K. K indicates how many predictions you want. The default is 5. If you have only 2 classes in your sentiment analysis, that does not make sense. So set starspace(..., K=2) if it is a binary sentiment classification. Currently, you can only set K when you train the model, not when you predict.
Closing, please use the k
argument in the predict functionality.
Thanks for this package. It has really helped to integrate StarSpace in my workflow. My question is when I run predict on a model (from a model trained with trainingMode = 1) I get a nice dataframe with possible labels for my dataset. the data frame, however, contains duplicate results (e.g. same labels with same probabilities). Is this intended/due to StarSpace or an implementation feature/bug? Best Bob