bookcorpus Search Results

211 results
for bookcorpus

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

ryankiros/neural-storyteller #17

BookCorpus dataset gone

The BookCorpus dataset located at http://www.cs.toronto.edu/~mbweb/ returns a 403. Can you provide a mirror or at the very least give a few lines as an example of how the dataset needs to be formatte…

koga73 updated 4 years ago
11
dmlc/gluon-nlp #1406

[nlp_data] Add BookCorpus

## Description The book corpus can now have a reliable, stable download link from https://the-eye.eu/public/AI/pile_preliminary_components/books1.tar.gz. Also, there are more links in https://the-eye…

sxjscience updated 3 years ago
9
microsoft/DeepSpeedExamples #8

Bing BERT

Hi guys, I have been trying to run the Bing experiment but it seems I can't for now. ``` "datasets": { -- | "wiki_pretrain_dataset": "/data/bert/bnorick_format/128/wiki_pretrain", | "bc_pr…

tomekrut updated 1 week ago
28
shaigue/pmi_masking #10

reproduce results on wiki+bookcorpus

Rather have something like 200GB disk for that and at least 4 CPUs (expected time ~4 days, the more CPUs the faster) Since we have the original results, we can compare our results and have some sanity…

shaigue updated 1 year ago
1
google-research/bert #1302

Degraded bookcorpus link at README.md

Refer to README.md, search for BookCorpus & click on the link. It will redirect you to [here](https://yknzhu.wixsite.com/mbweb) & you will notice BookCorpus is no longer available from the original au…

teohsinyee updated 2 years ago
1
kibitzing/awesome-llm-data #3

GPT Pre-training Data

### GPT-3 data mix * Datasets are not sampled in proportion to their size * Datasets we view as higher-quality are sampled more frequently * WebText2, Book1, Wikipedia datasets are sampl…

kibitzing updated 2 months ago
4
google-research/bert #1270

Creating_pretrainingdata is getting killed with bookcorpus d…

The dataset I am using is Book Corpus having 18000 books The system i am training on is having 64GB of RAM When I am trying to generate the pretraining data using create_pretraining_data.py it is g…

sairamgajavelli updated 2 years ago
1
pytorch/text #1008

Very large dataset (BookCorpus) 12 GB in Torchtext

## ❓ Questions and Help **Description** Hi I am training quick though task dot(sent1~ sent2) and dot(sent2~sent2) but my dataset is 12 GB and it throws memory error once it crosses 575 GB in memor…

thak123 updated 3 years ago
2
openai/finetune-transformer-lm #38

have you ever try a bigger corpus

I find `BERT` uses `BookCorpus (800M words)` and `Wikipedia (2500M words)` but `GPT` only uses `BookCorpus`, even `BERT` has complex model structure which may leads to effect representation ability, t…

ZizhenWang updated 5 years ago
1
soskek/bookcorpus #27

Here’s a download link for all of bookcorpus as of Sept 2020

You can download it here: https://twitter.com/theshawwn/status/1301852133319294976?s=21 it contains 18k plain text files. The results are very high quality. I spent about a week fixing the epub2txt…

shawwn updated 4 months ago
27

上一页 1...1 2 3 4 5 6 7...22 下一页

211 results for bookcorpus

211 results
for bookcorpus