Why use offload_param in CPU？

AetherCortex / Llama-X

Open Academic Research on Improving LLaMA to SOTA LLM

Apache License 2.0

1.59k stars 101 forks source link

Open xesdiny opened 1 year ago

xesdiny commented 1 year ago

I think the model fragment loading can be completed under the 6.7B parameter, why use parameterized offload to the cpu?

        "offload_param": {
            "device": "cpu",
            "pin_memory": true
        },

AetherCortex commented 1 year ago

We want to train with a larger batchsize