SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
55 stars 54 forks source link

New schema: Add `chat` schema #679

Closed patrickamadeus closed 1 month ago

patrickamadeus commented 1 month ago

Adding new chat schema to support #635

sabilmakbar commented 1 month ago

note to Holy & Sam:

btw we might want to revisit the ToD (Task-Oriented Dialogue) & DS (Dialogue System) tasks later on whether we should use this schema too (since HF tokenizers already support the format of chat -- specifically on the input data, which has the same schema as HF Chat).

holylovenia commented 1 month ago

note to Holy & Sam:

btw we might want to revisit the ToD (Task-Oriented Dialogue) & DS (Dialogue System) tasks later on whether we should use this schema too (since HF tokenizers already support the format of chat -- specifically on the input data, which has the same schema as HF Chat).

Hi @sabilmakbar, sorry for the late response. Replied on #635.

holylovenia commented 1 month ago

Hi @patrickamadeus and @sabilmakbar, I would like to let you know that we plan to finalize the calculation of the open contributions (e.g., dataloader implementations) in 31 hours, so it'd be great if we could wrap up the reviewing and merge this PR before then.

sabilmakbar commented 1 month ago

did the changes bcs the initial assignee had no response