NVIDIA / NeMo-Curator

Scalable data pre processing and curation toolkit for LLMs
Apache License 2.0
628 stars 83 forks source link

Regarding the issue of AEGIS model labels #349

Closed xiafeng-nb closed 2 weeks ago

xiafeng-nb commented 2 weeks ago

Regarding the issue of AEGIS model labels, in https://huggingface.co/nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0 On the Model Card, 'Sexual' is labeled as' 09 ', but in the code and the prompts in the paper,' Sexual 'is labeled as' 02'. Which one should I follow

ryantwolf commented 2 weeks ago

Hello! You should be good to follow the label in the prompt, as that is the one the model will use. In NeMo Curator, that has "Sexual" as O2 like you mentioned.

ryantwolf commented 2 weeks ago

Additionally, the model was trained with the label assignments shuffled for each sample. So, you could theoretically change the prompt to assign whichever label to O2 and it should be robust to it. Please reopen the issue if you have any more questions.