Open weinixuehao opened 4 years ago
Hi, Answering to your questions:
First: Can i directly use SQuAD for chinese (Close-Domain)QA task?
I don't really understand it, SQuAD is a QA dataset in English, you would need a "Chinese" version of a QA dataset. Maybe your question is if you can use BERT for Chinese? If it is, you should be trying BERT multilingual, I am not sure about its performance though...
Second: Is it the best solution to use run_squda.py to fine tune bert model with chinese dataset which format same as SQuAD dataset? if "First" is not possible!
To fine-tune bert model with a chinese dataset, I advise you to use the run_squad.py
example in the Hugging Face's repository with the bert-base-multilingual-(un)cased
version.
ps: Where to look for chinese dataset same as SQuAD dataset If I finally use the second solution?
Unfortunately, I don't have an answer to this question 😞
Hi @weinixuehao
You can use cdQA in chinese, but it requires some additional work. The idea is to:
bert-base-chinese
, then fine-tune on your chinese SQuAD-like dataset.Then you should be able to do closed-domain QA on your own chinese documents.
@andrelmfarias @fmikaelian Thanks to answer my question! This is what i need.
Hi @fmikaelian SQuAD(around 30M) dataset size less than DuReader dateset(around 1~2G per file) Need I convert all DuReader dataset to SQuAD-like dataset to train? May be it takes much time to convert and train.
I have three questions First: Can i directly use SQuAD for chinese (Close-Domain)QA task?
Second: Is it the best solution to use run_squda.py to fine tune bert model with chinese dataset which format same as SQuAD dataset? if "First" is not possible!
ps: Where to look for chinese dataset same as SQuAD dataset If I finally use the second solution?