fwd4 / dssm

184 stars 86 forks source link

Data used for this repo #1

Closed karthikmswamy closed 7 years ago

karthikmswamy commented 7 years ago

Hi @liaha, thanks for sharing the code for your implementation. It is good to compare performance. Can you share the data used for training your model? Can it be generated using another script? I did see that you've added an ignore to the data folder in your .gitignore file. Did you use the MS Marco dataset as done in the paper? Thanks.

fwd4 commented 7 years ago

@karthikmswamy Thank you for your attention. First of all, we are not the official authors of DSSM model, so performance comparison with us is not meaningful. And the data we are using is for internal use only, sorry about that. But definitely you can try MS Macro dataset if applicable.

Aniket-Pradhan commented 5 years ago

Hi!

Sorry for reviving an old thread. I was looking through your implementation, and I couldn't find the word-hashing step. Am I correct to assume that you have generated the bag-of-words already and stored it as one of your input files?

Also, to train the data, what all data-sets you would require, and in which format? (This is sort of a noob question, but I am getting started with DSSM) While browsing through your implementation, I can only see that you have imported two files, which means that only 2 data items are required, which will be the document and the query files, am I correct?