Closed thisisandreeeee closed 5 years ago
@thisisandreeeee the basedoc provides the other candidate articles to compare to. Your interpretation is correct, which means that it ranks the documents in the basedoc, together with the correct document (the test.txt example you gave is correct. The expected file format for basedoc is one line per document.
Thanks for the clarification! Wouldn't this also mean that one should always interpret hits@k
in conjunction with the size of basedoc, since hits@20
would mean very different things with a sample size of 50 versus 10,000.
@thisisandreeeee Yes that is how hit@k should be interpreted exactly.
Related to #75
I couldn't find much documentation describing the role of the basedoc file when running
./starspace test
for docspace recommendations (-trainMode 1 -fileFormat labelDoc), and I'd like to confirm if my following assumptions are correct.Given
test.txt
of the following format:Starspace uses (n-1) documents to predict the nth document (use
i love cats<tab>funny lolcat links
to predicthow to be a petsitter
). My question is, how does it actually predict the nth document? Is the mechanism similar to that in./query_predict
, where it tries to rank the documents in basedoc?I'm skeptical this is the case, because according to the arXiv paper, starspace "predict[s] the n’th article by ranking it against 10,000 other unrelated articles". Am I correct to say that the basedoc contains these 10,000 unrelated articles?
I would really like to understand how basedoc is used for docspace testing, and what is the expected file format e.g. is each line one document, or one user?