Is there a specific reason to not let run the trainer in memory?

datadreamer-dev / DataDreamer

DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤

https://datadreamer.dev

MIT License

806 stars 40 forks source link

Is there a specific reason to not let run the trainer in memory? #16

Closed FraFabbri closed 6 months ago

FraFabbri commented 6 months ago

https://github.com/datadreamer-dev/DataDreamer/blob/ad3dd9ce73e8ad66f3c1b124686ea970e347343a/src/trainers/trainer.py#L96

AjayP13 commented 6 months ago

Only because we can then assume there is a directory available to store checkpoints & store the final model weights. The training code relies on a directory being available to make things easier.

Do you have a use case for in-memory training? If you want to share, you can email me at ajayp@seas.upenn.edu and I can try to support it.

FraFabbri commented 6 months ago

makes sense, thanks for the explanation :)

My intuition was that if we can let the trainer running in memory, it would help for fast experimentation, e.g. when not necessarily we want to store the model weights but just running a notebook.

AjayP13 commented 6 months ago

If you train with LoRA, which we support across any of the trainers, it's pretty lightweight, you can have it only a save a few MBs, so that might help save disk space / make sure it's not too slow when running.

AjayP13 commented 6 months ago

Closing this for now, but if you run into any trouble with this let me know. Another thing worth noting here is you can use a hack to actually run DataDreamer training in memory by possibly utilizing /dev/shm/my_output_folder as the output directory. /dev/shm/ is a file system that stores data in RAM (ram disk) vs on-disk.