SEACrowd / seacrowd-datahub

A collaborative project to collect datasets in SEA languages, SEA regions, or SEA cultures.
Apache License 2.0
65 stars 57 forks source link

Closes #276 | Implement PRDECT-ID dataloader #322

Closed ljvmiranda921 closed 8 months ago

ljvmiranda921 commented 9 months ago

Closes #276

Might need some help in the tests. I can make the source schema pass, but not the seacrowd ones. I think what's peculiar in this dataset is that it has two types of labels for the same TEXT schema. I can make the data loader work by running the following code:

from datasets import load_dataset
sentiment_dset = load_dataset("seacrowd/sea_datasets/prdect_id/prdect_id.py", name="prdect_id_sentiment_seacrowd_text")
emotion_dset =  load_dataset("seacrowd/sea_datasets/prdect_id/prdect_id.py", name="prdect_id_emotion_seacrowd_text")

Checkbox

ryanignatius commented 8 months ago

Might need some help in the tests. I can make the source schema pass, but not the seacrowd ones. I think what's peculiar in this dataset is that it has two types of labels for the same TEXT schema.

I search from our existing dataloaders and found similar case in id_google_play_review maybe we can use the same approach?

holylovenia commented 8 months ago

Might need some help in the tests. I can make the source schema pass, but not the seacrowd ones. I think what's peculiar in this dataset is that it has two types of labels for the same TEXT schema.

I search from our existing dataloaders and found similar case in id_google_play_review maybe we can use the same approach?

Hi @ljvmiranda921, I've tried and tested this dataloader. Everything works well. I will approve it after you modify the source schema as suggested by @ryanignatius.

ljvmiranda921 commented 8 months ago

Got it! Will update within the week 👍

ljvmiranda921 commented 8 months ago

@ryanignatius @holylovenia PR updated! Feel free to review again 🙇

ryanignatius commented 8 months ago

@ljvmiranda921 thank you for the update

sorry for not being clear before, in the id_google_play_review we have 2 sources for id_google_play_review_source and id_google_play_review_posneg_source

I'm suggesting to change current prdect_id_source to prdect_id_emotion_source and prdect_id_sentiment_source wdyt?

ljvmiranda921 commented 8 months ago

Ah I see. So there will be a source and seacrowd config for both prdect_id_emotion and prdect_id_sentiment, no? Will update

ljvmiranda921 commented 8 months ago

Hi reviewers @ryanignatius and @holylovenia , it's now ready for another round of reviews :)

To be honest, I'm not sure if I did it right. I have now two configurations for emotion and sentiment but the data loading logic is still the same. Kind of similar to id_google_play_reviews. Let me know if there's any needed changes!