cognitivecomputations / grokadamw

Apache License 2.0
119 stars 7 forks source link

Very High System-RAM Usage #6

Open linux-leo opened 1 month ago

linux-leo commented 1 month ago

Using liger kernels and nefttune, the system consumes 3 gigabytes of ram with AdamW, meanwhile with grokadamw, the system uses up the entire 12 gigabytes of ram in a google colab enviroment and crashes.

linux-leo commented 1 month ago

Full Parameter finetuning of a 350m Model on a T4, Batch size 8, context size 512.

linux-leo commented 1 month ago

Works with a 135m model, but the memory usage is still to high to always use as an alternative to AdamW