Open michaelfeil opened 1 year ago
Thanks for publishing the model to Huggingface. For using the Triton Inference server in Products like https://github.com/fauxpilot/fauxpilot:
Do you have any preferred way to convert it to Nvidia Triton Inference server (e.g. https://github.com/triton-inference-server/fastertransformer_backend), starting e.g. from the checkpoint by Huggingface?
model = AutoModelForCausalLM.from_pretrained( "bigcode/santacoder", revision="no-fim", # name of branch or commit hash trust_remote_code=True )
Thanks for publishing the model to Huggingface. For using the Triton Inference server in Products like https://github.com/fauxpilot/fauxpilot:
Do you have any preferred way to convert it to Nvidia Triton Inference server (e.g. https://github.com/triton-inference-server/fastertransformer_backend), starting e.g. from the checkpoint by Huggingface?