bigcode-project / transformers

Apache License 2.0
26 stars 8 forks source link

Just to see the diff #3

Open Muennighoff opened 1 year ago

loubnabnl commented 1 year ago

Summary of the changes I added:

Muennighoff commented 1 year ago
  • after selecting a small batch size (32) the same batch would be in all workers so no need to gather values, one thing we will need to add is split the batch (32) to 2 or 3 equal chunks then do grad acc because 32 won't fit in one worker.

Amazing work - do you want me to add the last point you mentioned?

loubnabnl commented 1 year ago

You can add it you have time, otherwise I will add it later 🤗

Muennighoff commented 1 year ago

You can add it you have time, otherwise I will add it later 🤗

Done, but not tested. May have a bug 👻