Data Format - Githubissues

HowieHwong / TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

https://trustllmbenchmark.github.io/TrustLLM-Website/

MIT License

357 stars 30 forks source link

Data Format #32

Closed redwyd closed 2 weeks ago

redwyd commented 2 weeks ago

I would like to ask if for the same dataset, when generated using different models, will the data be processed into different formats? For example, will llama use special characters such as <>, [INST].

HowieHwong commented 2 weeks ago

Yes, it will be processed into different formats by the predefined templates in FastChat.

redwyd commented 1 week ago

Yes, it will be processed into different formats by the predefined templates in FastChat.

I noticed that the prompt of the privacy_awareness_query dataset contains two roles: system and third party user. However, it seems that during the test, the prompt is directly used as the user input instead of enclosing the system part with system token . Will this affect the final test results?