IBM / unitxt

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
https://unitxt.rtfd.io
Apache License 2.0
139 stars 29 forks source link

Better support for chat format #989

Open yoavkatz opened 2 days ago

yoavkatz commented 2 days ago

1) Seperate the structured representation and make it available at a dedicated field for people want to use it externally (e.g for using open ai api) (2) change existing formats to use this mechanism

There are downsides for using HF Tokenizer chat template - it requires access to the HF model page (e.g. sometime requires huggingface token login ). I think we should consider a general jinja format - so people can just copy the jinja string and use it for formatting.

Originally posted by @yoavkatz in https://github.com/IBM/unitxt/issues/988#issuecomment-2206379793