facebookresearch / StarSpace

Learning embeddings for classification, retrieval and ranking.
MIT License
3.94k stars 531 forks source link

how to apply query_predict for pagespace #178

Closed jwijffels closed 6 years ago

jwijffels commented 6 years ago

I'm trying to build some examples using the R wrapper that I've created. I was building an example for pagespace (trainmode 1), training data looks as follows. And I

__label__VERVOERSLIJN __label__VERVOERBELEID __label__VERVOERPERSPOOR __label__NMBS __label__DIENSTREGELINGVANHETVERVOER
__label__FRAUDE __label__CONTROLEORGAAN __label__VERVOERBELEID __label__PLAATSBEWIJS __label__VERVOERPERSPOOR __label__NMBS
__label__STRAFRECHT __label__SEKSUEELGEWELD __label__GEGEVENSBANK __label__STRAFVERVOLGING __label__SEKSUEELMISDRIJF __label__PSYCHOLOGISCHEINTIMIDATIE
__label__STRAFRECHT __label__SEKSUEELGEWELD __label__GEGEVENSBANK __label__STRAFVERVOLGING __label__SEKSUEELMISDRIJF __label__PSYCHOLOGISCHEINTIMIDATIE
__label__FISCALITEIT __label__VLUCHTELING __label__GEZINSLAST __label__ASIELRECHT __label__BELASTINGAANGIFTE __label__BELASTINGTERUGGAVE

Based on the model, I would like to make a prediction (e.g. using query_predict for example), unfortunately this fails because query_predict basically does

string input;
vector<Base> query_vec;
sp.parseDoc(input, query_vec, " ");

The result is that query_vecalways is of size 0 because the training only contains labels as parseDoc which calls the parse functionality (bool DataParser::parse(const std::vector<std::string>& tokens, vector<Base>& rslts)) from parser.cpp only identifies words, not labels

So hence my question. Is it possible to get predictions with such a setting (trainmode 1, only labels) and how?

ledw commented 6 years ago

@jwijffels Hi, thanks for reporting. We'll update the query_predict to be able to handle trainMode 1.

jwijffels commented 6 years ago

Great. Looking forward to it.

jwijffels commented 6 years ago

Thanks for the proposed fix. I wonder how this would influence the result from the paper, if there are any influence whatsoever?

ledw commented 6 years ago

@jwijffels it should not affect the results from the paper, as the query_predict functionality is not used in the paper.

jwijffels commented 6 years ago

That's good to hear! Thank you for the fix, I've tried it out and this now gives also predictions for trainmode 1. Many thanks.