Closed msamwelmollel closed 1 month ago
Added to the wishlist.
Actually do you have access to TPU v5? If not, it's a moot point
v3/v4 (32G HBM per chip) will OOM w/ full parameter tuning for Gemma 7B. 2B doesn't work either since the sharding is diff.
Description of the feature request:
I would like the cookbook to add continue pre train in TPU
What problem are you trying to solve with this feature?
No response
Any other information you'd like to share?
No response