-
Hi. I am trying to convert corpora from HF to their IPA form with the following snippet. But I am getting really slow speeds.. only a couple of examples per second. Do you know how it can be sped up? …
-
ML.NET 2.0 comes with Text classification.
The Text classification scenario is based on NAS-BERT, which is trained on _English Wikipedia plus BookCorpus_.
When using text classification for a not …
-
I know BERT has achieved SOTA in many NLP tasks, such as SQuAD and SWAG.
But note that the data (both training and test) of SQuAD is from **Wikipedia**, and that of SWAG is from the **BookCorpus**, a…
-
There is a paragraph like this on page 4 of your technical report:
> The text pairs selected from the web and other public sources are not guaranteed to be closely related. related. Therefore, dat…
-
## Description
In the function [`scripts.pretraining.bert.create_pretraining_data.tokenize_lines()`](https://github.com/dmlc/gluon-nlp/blob/5ff0519aa5a89e7e2a2c0afab164e17de55231b4/scripts/pretrainin…
-
Have some people met this problem?
I know it's because of the dataset being loaded, and it's possible that it's a version issue or a gzip archive issue.
But I don't know how to modify it.
`UnicodeD…
-
Hi, _datasets_ is really amazing! I am following [run_mlm_no_trainer.py](url) to pre-train BERT, and it uses `tokenized_datasets = raw_datasets.map(
tokenize_function,
batche…
-
It will only extract the files. Basically, extend the function to include `url=None`.
-
I tried to download the bookscorpus data. So far I just downloaded around 5000 books. Can anyone get all the books? I met a lot `HTTP Error: 403 Forbidden` How to fix this ? Or can i get the all the…
wxp16 updated
4 years ago
-
我在训练BERT时使用了BookCorpus+Wikipedia-en数据,训练参数设置了batch_size=5120,warmup=0.1,learning_rate=4e-4,使用deep_init,没有用混合精度,steps(计算了40个epochs)=240k。但是在127k步左右突然Loss增大性能下降,且之后模型停止学习。请问这个可能是什么原因导致? (Log如下所示)
![B…