AetherCortex / Llama-X

Open Academic Research on Improving LLaMA to SOTA LLM
Apache License 2.0
1.59k stars 101 forks source link

Why use offload_param in CPU? #9

Open xesdiny opened 1 year ago

xesdiny commented 1 year ago

I think the model fragment loading can be completed under the 6.7B parameter, why use parameterized offload to the cpu?

        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },
AetherCortex commented 1 year ago

We want to train with a larger batchsize