NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.78k stars 2.45k forks source link

Is NeMo nvidia's recommend way to run llm? #6566

Closed lizelive closed 1 year ago

lizelive commented 1 year ago

Is NeMo the best way to run LLM on your hardware for conversation?

My second experience was that on 4090 the https://huggingface.co/nvidia/GPT-2B-001 did not work #6564 I want to play around with FP8 and it looked like this was fastest way but it was just not working, and when did it to run was 4x slower than regular hf transformers.

So simplest thing is have a really damn simple demo showing how to run models from hugging face.

Using 4090 and A100.

Like I know it's possible but the demo you have for using your own model does not work, and it should be a big friendly button notebook that I run end to end in the official docker container to get those models running.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.