Open SamuelCahyawijaya opened 10 months ago
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.
Yep working on this issues got busy with some things but will try to wrap this issues by next week.
Thanks for letting us know, @bp-high, I'm removing the stale tag for now. Please add a tag pr-ready
whenever you have finished on your dataloader so that the bot won't tag this issue as stale or let us know if you need more time for this issue.
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.
Sorry couldn't work on this last weekend due to christmas holidays and celebration will try to conclude this, this weekend.
Thanks for the update, @bp-high! no rush on this; please take your time to enjoy ur holiday!
Hi, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.
Hi @bp-high, may we know the update on this dataloader issue? It's been 3 weeks since the last poke from the SEACrowd stale-checker, and we might consider unassigning if there's no progress update in the next 24 hours.
Hi @, may I know if you are still working on this issue? Please let @holylovenia @SamuelCahyawijaya @sabilmakbar know if you need any help.
Hi, I wanna ask about this.
In the Kaggle, there are two sources of the dataset, namely (a) thai-government-corpus.csv
and (b) thai-wikipedia-corpus.csv
. Both have "article" and "text" columns. I assume here both of the sources should be combined. Hereby, I have two questions.
I'm quite confused as the (a) dataset has a lot of similar values: should we still include this?
For the seacrowd schema, do we need to concat it as "{}-{}".format(article, text) or just take the text one? If concat, the article value of the (a) dataset is integer, while the (b) one is string. How should we process this? *compare prev picture and the following picture
Hi, I wanna ask about this.
In the Kaggle, there are two sources of the dataset, namely (a)
thai-government-corpus.csv
and (b)thai-wikipedia-corpus.csv
. Both have "article" and "text" columns. I assume here both of the sources should be combined. Hereby, I have two questions.
- I'm quite confused as the (a) dataset has a lot of similar values: should we still include this?
- For the seacrowd schema, do we need to concat it as "{}-{}".format(article, text) or just take the text one? If concat, the article value of the (a) dataset is integer, while the (b) one is string. How should we process this? *compare prev picture and the following picture
Hi @khelli07, I'm also not sure what the content is about since I don't understand Thai. May I ask for your suggestion on this dataset, @mrpeerat and @parinzee? 🙏
Hi, I wanna ask about this.
In the Kaggle, there are two sources of the dataset, namely (a)
thai-government-corpus.csv
and (b)thai-wikipedia-corpus.csv
. Both have "article" and "text" columns. I assume here both of the sources should be combined. Hereby, I have two questions.
- I'm quite confused as the (a) dataset has a lot of similar values: should we still include this?
- For the seacrowd schema, do we need to concat it as "{}-{}".format(article, text) or just take the text one? If concat, the article value of the (a) dataset is integer, while the (b) one is string. How should we process this? *compare prev picture and the following picture
Hi, I want to ask again. For this dataset, do we count this as local or public? Because as far as I know, we have to login to download the dataset. So even though it is accessible by everyone, you have to login first. Another option is Kaggle API, but it is CLI-based (and ofc, you still need to login though https://github.com/Kaggle/kaggle-api)
Hi, I want to ask again. For this dataset, do we count this as local or public? Because as far as I know, we have to login to download the dataset. So even though it is accessible by everyone, you have to login first. Another option is Kaggle API, but it is CLI-based (and ofc, you still need to login though https://github.com/Kaggle/kaggle-api)
Hi @khelli07, if it can be solved using CLI, could we make it _LOCAL = False
and attach a guide on how to use it to the _DESCRIPTION
like this?
Main code is done, just have not done the metadata yet. I'll do it in near future.
Dataloader name:
hse_thai/hse_thai.py
DataCatalogue: http://seacrowd.github.io/seacrowd-catalogue/card.html?hse_thai