sampling in-batch negatives

ContextualAI / gritlm

Generative Representational Instruction Tuning

https://arxiv.org/abs/2402.09906

MIT License

479 stars 33 forks source link

Closed raghavlite closed 2 months ago

raghavlite commented 2 months ago

In the paper, you mention that you sample in batch examples from the same dataset. In training/run.py, you are concatenating all datasets here.

Is there any other location in the code where you specify to sample in batch negatives from the same dataset?

Muennighoff commented 2 months ago

Note that the lenghts of each dataset are saved right above.

raghavlite commented 2 months ago

thanks