Open SamuelCahyawijaya opened 2 years ago
Hi @aliakbars , are you still working on this? I will assume inactivity if there's no reply and will free the assignees. Thanks!
Hi, @bryanwilie Yes. Working on this. I'll create the PR asap. Sorry for the delay.
No worries @aliakbars, please take your time. Thank you for contributing!
Just did some exploratory data analysis. I found that the tweets are only from 6 users (might be a retweet). Also, it's not filtered yet. Some of the tweets are replies, e.g.
"RT
: Siap-siap"
or an image/video, e.g.
"RT
: https://t.co/Z6Ls07s1bn"
Should we proceed with this?
It does have local languages, e.g. Sundanese, though.
@bryanwilie What do you think about this issue?
@SamuelCahyawijaya @holylovenia
Hi @aliakbars : thank you for the update and I apologize for the late reply. Later on, we plan to label the quality for all the datasets in NusaCatalogue, so we can push this one through first for now.
NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?id_poem_tweets