Closed SamuelCahyawijaya closed 2 years ago
I think this one requires some formatting before it can fit the current schema. I think in general, we can follow nusantara_t2t
schema and we add 1 data for each system turn utterance with the text_1
for the dialogue history, formatted, text_2
for the response sentence, text_1_name
for the persona, and text_2_name
just a string "response".
The id
can be the "{dialogueid}{dialogue turn}". if there is no dialogue id provided then just enumerate the data. For the dialogue_turn, we can enumerate system utterance with the first system utterance corresponds to 0.
The format of the text_1
could be something like:
U: <user_utterance> | S: <system_utterance> | U: <user_utterance>
Okay got it @SamuelCahyawijaya, thank you! For source
schema do you have any suggestion how I implement it?
https://indonlp.github.io/nusa-catalogue/card.html?xpersona_id