erfanzar / EasyDeL

Accelerate your training with this open-source library. Optimize performance with streamlined training and serving options with JAX. 🚀
https://easydel.readthedocs.io/en/latest/
Apache License 2.0
167 stars 19 forks source link

What is the hardware spec are you using to tran a LLAMA model with 7B params #95

Closed jchauhan closed 5 months ago

jchauhan commented 5 months ago

I am frequently getting Out of Memory error. I am using v2.8 TPU. TPU v3.8 is generally not available, however, both have same similar specs 8 cores and 8GB memory.

Total hbm usage >= 9.50G:
    reserved        530.00M 
    program           8.98G 
    arguments            0B 

Output size 0B; shares 0B with arguments.

Program hbm requirement 8.98G:
    global           241.0K
    scoped            9.56M
    HLO temp          8.97G (100.0% utilization: Unpadded (8.94G) Padded (8.94G), 0.4% fragmentation (36.60M))

  Largest program allocations in hbm:
erfanzar commented 5 months ago

hi, TPUv2 and v3 don't have same memory you can run code on Kaggle TPUs too