Is NeMo nvidia's recommend way to run llm?

lizelive commented 1 year ago

Is NeMo the best way to run LLM on your hardware for conversation?

My second experience was that on 4090 the https://huggingface.co/nvidia/GPT-2B-001 did not work #6564 I want to play around with FP8 and it looked like this was fastest way but it was just not working, and when did it to run was 4x slower than regular hf transformers.

So simplest thing is have a really damn simple demo showing how to run models from hugging face.

lmsys/fastchat-t5-3b-v1.0
nvidia/GPT-2B-001
google/flan-ul2

Using 4090 and A100.

Like I know it's possible but the demo you have for using your own model does not work, and it should be a big friendly button notebook that I run end to end in the official docker container to get those models running.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 7 days since being marked as stale.

NVIDIA / NeMo

Is NeMo nvidia's recommend way to run llm? #6566