NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
12.01k stars 2.5k forks source link

Chat method for TextGeneration API #7901

Closed okuchaiev closed 9 months ago

okuchaiev commented 11 months ago

I propose class TextGeneration (from which MegatronGPTModel inherits) to add .chat(dict) method. In addition to that MegatronGPTModel should have member method or properties which would allow to set or get model's template. That template should be serialized/retrieved from model._cfg object and saved to .yaml file during model serialization.

Context. Base models do not require this method. But aligned models require setting the correct template for how user, assistant turns as well as system prompt need to be presented to the model. This means that the user of aligned .nemo checkpoint need to chase model's developer or documentation to understand what template they should set.

Example usage:

messages = [
    {"role": "system", "content": "You are a helpful AI assistant well trained in astrophysics. You produce helpful and insightful responses."},
    {"role": "user", "content": "Does Earth revolve around the Sun?"},
    {"role": "assistant", "content": "Yes, but to be more precise, Earth-Sun system revolves around a common center of mass which is few miles off Sun's center."},
    {"role": "user", "content": "How many miles is it away from Sun's center?"}
]

model.eval()
responses=model.chat(messages)
print(responses)

If user calls .chat method on base model it should raise an exception saying chat template is not set.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.