Open f617452296 opened 4 years ago
We haven't looked into this, but you could try it using BERT-Base, Chinese to initialize the model.
I test the model on chinese GEC task and it works fine.
May I have your email to ask some question?
------------------ Original ------------------ From: qiuhuiGitHub <notifications@github.com> Date: Wed,Mar 18,2020 1:50 PM To: google-research/lasertagger <lasertagger@noreply.github.com> Cc: fishfang <617452296@qq.com>, Author <author@noreply.github.com> Subject: Re: [google-research/lasertagger] How can I use this model on Chinese dataset? (#8)
I test the model in chinese GEC task and it works fine.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
May I have your email to ask some question? … ------------------ Original ------------------ From: qiuhuiGitHub <notifications@github.com> Date: Wed,Mar 18,2020 1:50 PM To: google-research/lasertagger <lasertagger@noreply.github.com> Cc: fishfang <617452296@qq.com>, Author <author@noreply.github.com> Subject: Re: [google-research/lasertagger] How can I use this model on Chinese dataset? (#8) I test the model in chinese GEC task and it works fine. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
U can use any chinese bert model by simple replace the bert path and it works fine.
I test the model on chinese GEC task and it works fine.
Good to know this!
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
Could you please tell me how I can run this model on GEC task Such as step1. Phrase Vocabulary Optimization and 2. Converting Target Texts to Tags
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
Could you please tell me how I can run this model on GEC task Such as step1. Phrase Vocabulary Optimization and 2. Converting Target Texts to Tags
First I suggest you read the run_wikisplit_experiment.sh
in the project. You can simple run lasertagger by changing the script. Here is a example.
vocab_size
in configs/lasertagger_config.json because the vocab_size is different in Chinese BERT.I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
Could you please tell me how I can run this model on GEC task Such as step1. Phrase Vocabulary Optimization and 2. Converting Target Texts to Tags
First I suggest you read the
run_wikisplit_experiment.sh
in the project. You can simple run lasertagger by changing the script. Here is a example.
- You should change your data into wikisplit format, such as "I like you \t I love you".
- Change the all the Path in the script to your's.
- Change
vocab_size
in configs/lasertagger_config.json because the vocab_size is different in Chinese BERT.- Run the script step by step. Best wishes.
It helps a lot! Thank you!
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
By the way, I want to know whether the training data on http://tcci.ccf.org.cn/conference/2018/taskdata.php need to be seg into words or just I can send a whole sentence into the model? Thanks!
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
By the way, I want to know whether the training data on http://tcci.ccf.org.cn/conference/2018/taskdata.php need to be seg into words or just I can send a whole sentence into the model? Thanks!
eh, the input of the chinese bert is separate word, so you should cut the sentence into separate word.
I test the model on chinese GEC task and it works fine.
Is it a public dataset? If so, could you share a link?
http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.
By the way, I want to know whether the training data on http://tcci.ccf.org.cn/conference/2018/taskdata.php need to be seg into words or just I can send a whole sentence into the model? Thanks!
eh, the input of the chinese bert is separate word, so you should cut the sentence into separate word.
Hi, I also tested GEC task. But my model didn't work well, it didn't actually 'correct', it just delete every difference and even some same parts between source and target texts. I use JIEBA to cut my sentences and I thought everything was done just fine, only the results were pretty bad. Could you please tell me did you have the same problem and which score did you use?
And can this model be helpful on Chinese dataset?