Conversion of Huggingface bigcode/santacoder to Nvidia Triton Inference server

Thanks for publishing the model to Huggingface. For using the Triton Inference server in Products like https://github.com/fauxpilot/fauxpilot:

Do you have any preferred way to convert it to Nvidia Triton Inference server (e.g. https://github.com/triton-inference-server/fastertransformer_backend), starting e.g. from the checkpoint by Huggingface?

model = AutoModelForCausalLM.from_pretrained(
    "bigcode/santacoder",
    revision="no-fim", # name of branch or commit hash
    trust_remote_code=True
)

bigcode-project / Megatron-LM

Conversion of Huggingface bigcode/santacoder to Nvidia Triton Inference server #17