Support using models from HuggingFace directly

AI-Hypercomputer / JetStream

JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).

Apache License 2.0

231 stars 31 forks source link

Support using models from HuggingFace directly #140

Open samos123 opened 1 month ago

samos123 commented 1 month ago

I should be able to serve a model by simply providing the HuggingFace model ID. Requiring users to convert checkpoints is too troublesome.

vipannalla commented 2 weeks ago

Thanks for the feedback. This feature is currently not supported, but we have added it to our roadmap to simplify. Some models (such as LLama variants) need explicit acknowledgements from Meta's site before you can use them.

samos123 commented 2 weeks ago

That can be handled by respecting HF_TOKEN environment variable to automatically download auth gated models. That's how vLLM and other OSS does it.