May I inquire about the size of the model?

LucienShui / huggingface-vscode-endpoint-server

starcoder server for huggingface-vscdoe custom endpoint

Apache License 2.0

167 stars 56 forks source link

The CPU isn't all that critical, two NVIDIA RTX 3090s with 24G CUDA memory should be able to handle loading and inferencing with fp16 precision.

The time it takes to download the model is largely contingent upon your network speed, while the time to load the model from local storage is determined by your disk speed.

Given that the model size is approximately 60GB, you should be able to estimate the time required on your own.

Tutorial reference: https://huggingface.co/docs/transformers/autoclass_tutorial

LucienShui / huggingface-vscode-endpoint-server

May I inquire about the size of the model? #3