Lightning-Universe / lightning-transformers

Flexible components pairing 🤗 Transformers with :zap: Pytorch Lightning
https://lightning-transformers.readthedocs.io
Apache License 2.0
607 stars 77 forks source link

Shuffling support #298

Open juliusfrost opened 1 year ago

juliusfrost commented 1 year ago

🚀 Feature

Add option for data shuffling in core/data.py Data shuffling is crucial for removing dataset structure bias.

Motivation

I noticed my model was not performing well when I was using a custom dataset with spikes in performance across the epoch. I then realized it was because the class data was in sequence, and there was no shuffling performed by default. I then looked into the code but couldn't find any option to add shuffling: core/data.py I had to then overwrite 3 functions, train_dataloader, val_dataloader, test_dataloader in order to get this functionality.

Pitch

Add a boolean shuffling argument in the constructor that enables this.

Alternatives

Additional context