bigcode-project / Megatron-LM

Ongoing research training transformer models at scale
Other
373 stars 48 forks source link

Conversion of Huggingface bigcode/santacoder to Nvidia Triton Inference server #17

Open michaelfeil opened 1 year ago

michaelfeil commented 1 year ago

Thanks for publishing the model to Huggingface. For using the Triton Inference server in Products like https://github.com/fauxpilot/fauxpilot:

Do you have any preferred way to convert it to Nvidia Triton Inference server (e.g. https://github.com/triton-inference-server/fastertransformer_backend), starting e.g. from the checkpoint by Huggingface?

model = AutoModelForCausalLM.from_pretrained(
    "bigcode/santacoder",
    revision="no-fim", # name of branch or commit hash
    trust_remote_code=True
)