NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.83k stars 899 forks source link

Input Format for DRMM #740

Closed voladorlu closed 5 years ago

voladorlu commented 5 years ago

Hi, can anybody help me on the input format of DRMM? Based on the given tutorial, the cross entropy loss function needs sample negative instances. What if the input data already includes the negative samples (i.e. labeled as 0, positive one labeled as 1)? The negative sampling method will do negative sampling for every instance in the training data (including the negative one labeled as 0)?

crystina-z commented 5 years ago

if i get your question right:

  1. the negative sampling can only happen if the negative samples are included in the input otherwise there is nowhere to sample from (see here)
  2. yes, it would do negative sampling for each positive data for _numdup times (see here)
voladorlu commented 5 years ago

@Crysitna Thank you so much for your clarification. It really helps me a lot. -:)

voladorlu commented 5 years ago

One more question. I note that the default batch_size for DRMM is somehow small (suggested as 20). Will a relative larger batch size (e.g. 128, 256) have negative influence on the model performance?

uduse commented 5 years ago

I hope things are working well for you now. I’ll go ahead and close this issue, but I’m happy to continue further discussion whenever needed.