-
I tried to download the bookscorpus data. So far I just downloaded around 5000 books. Can anyone get all the books? I met a lot `HTTP Error: 403 Forbidden` How to fix this ? Or can i get the all the…
wxp16 updated
4 years ago
-
With issues like #9, can training be improved by starting with a lower resolution, possibly also simpler documents (ie generated or large font docs as in Pix2struct pretrain) before moving to training…
-
In the article "Bert: Pretraining of Deep..", It mentions that Wikipedia and Book corpus dataset are used to pretrain. When I try to generate my own data with Wikipedia, I get about 5.5 million artic…
-
I wonder whether this codes have recurrented the effect the same as the paper described.And when do you plan to submit your pre-trained model ,thanks a lot.
-
Hi. Thanks very very much for your skip-thought vector.
I can use encode() function with downloaded data(utable.npy, btable.npy, uni_skip.npz, etc.).
And now, I want to decode my encoded sentences…
-
Hey author!
I want to load word properties file in "Pulication/ano3 word frequency VS l2_norm.ipynb/Dataset preparation", but i have some problem, can you provided the file of word properties file …
-
A model has been trained according to the instruction [here](https://github.com/ryankiros/skip-thoughts/tree/master/training). I can load the model using the following commands:
```
import tools
embe…
amirj updated
7 years ago
-
Currently, this is the main bottleneck for the program, and takes more than 80% of the time.
We expect that as the size of the data grows this will be even greater.
Try to think a little bit on how t…
-
作者您好。心中有几个疑问,希望您能不吝赐教
1.pre-train上来就是一堆超参(这些超参在哪个文件里面的);pre-train部分的最后一句是训练把,而且后面带了一堆参数?到底我要输入什么指令从而接下去运行。
2.我的服务器只有一个gpu,要运行你的代码,是不是要改一些配置?但是到底要改哪些参数
3.数据集的路径在哪个文件,没看到有
4."we use the English Wiki…
-
Here we combine all the datasets we can collect
- [OSCAR's CommonCrawl Dataset](https://traces1.inria.fr/oscar/)
- [Arabic BERT Corpus](https://www.kaggle.com/abedkhooli/arabic-bert-corpus)
- [Hi…