SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
60 stars 56 forks source link

Closes #309 | Create dataset loader for Vietnamese Hate Speech Detection (UIT-ViHSD) #309Uit vihsd #501

Closed Gyyz closed 5 months ago

Gyyz commented 5 months ago

Closes #309 | Add/Update Dataloader {UIT-ViHSD}

First line PR Message: Closes #{ISSUE_NUMBER}

where you replace the {ISSUE_NUMBER} with the one corresponding to your dataset.

Checkbox

Gyyz commented 5 months ago

Sorry, open a new PR due to previous commit problems from #453

ljvmiranda921 commented 5 months ago

Thank you for updating the commits! I appreciate it. I approved it now. Just waiting for @raileymontalan 's review 👍 then we can merge

Gyyz commented 5 months ago

Hi @Gyyz, it isn't clear which of the [0, 1, 2] labels here correspond to the [CLEAN, OFFENSIVE, HATE] labels specified in the paper. Could you please specify if accordingly? Thanks.

Hi, @raileymontalan, added a logging.INFO to print the details.

raileymontalan commented 5 months ago

Hi @Gyyz, it isn't clear which of the [0, 1, 2] labels here correspond to the [CLEAN, OFFENSIVE, HATE] labels specified in the paper. Could you please specify if accordingly? Thanks.

Hi, @raileymontalan, added a logging.INFO to print the details.

Hi @Gyyz, after consulting with @holylovenia, we think it would be better for the labels to be the proper class names [CLEAN, OFFENSIVE, HATE] for the SEACrowd schema. I believe the labels for the source schema can be left as is ([0, 1, 2]).

Gyyz commented 5 months ago

Hi @Gyyz, it isn't clear which of the [0, 1, 2] labels here correspond to the [CLEAN, OFFENSIVE, HATE] labels specified in the paper. Could you please specify if accordingly? Thanks.

Hi, @raileymontalan, added a logging.INFO to print the details.

Hi @Gyyz, after consulting with @holylovenia, we think it would be better for the labels to be the proper class names [CLEAN, OFFENSIVE, HATE] for the SEACrowd schema. I believe the labels for the source schema can be left as is ([0, 1, 2]).

Sure. Will update this shortly.