Open dorsa-zeinali opened 5 days ago
I would use the context length of the original model. IIRC the devset is stored in cpu memory so that shouldn't be an issue for gpu memory. You may want to look at my comment in the other thread; you should be able to easily rewrite the e2e fine tuning script to be more memory efficient. Also, if you're quantizing to 4 bits, e2e doesn't really do anything so I wouldn't bother with it there.
hi, what context size and devset size do you think is reasonable for the e2e finetuning step given that I have 1 gpu with 48GB? thank you so much