bookcorpus Search Results

215 results
for bookcorpus

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

kibitzing/awesome-llm-data #3

GPT Pre-training Data

### GPT-3 data mix * Datasets are not sampled in proportion to their size * Datasets we view as higher-quality are sampled more frequently * WebText2, Book1, Wikipedia datasets are sampl…

kibitzing updated 4 months ago
4
ShaneTsui/image2story #1

No guide on getting start.

Can you please provide little tutorial about how to input the image and get the story.? Since you have used `bookcorpus` so If possible please also provide some guide about how to trail model on di…

enmohsinali updated 2 years ago
1
google-research/bert #906

Processing book corpus

Hi team et al, I'd like to know how to process bookcorpus to pre-training. I am confusing to process this data. Should I treat 1 book as a document including all sentences or 1 chapter as a docu…

ngoanpv updated 2 years ago
1
hhexiy/pungen #1

Possible to share the books corpus

Hello I was wondering if you could share the books corpus as the crawling takes pretty long. I was using it https://github.com/soskek/bookcorpus

tuhinjubcse updated 1 year ago
1
google-research/bert #1069

pretraining BERT CASED model gives lower accuracy than UNCAS…

I pretrained both BERT uncased as well as BERT cased models using the same hyperparameters(for uncased model) on Wikipedia and BookCorpus, but the BERT cased models perform worse than the google check…

yzhang123 updated 4 years ago
1
dmort27/epitran #82

Speeding up mapping with HuggingFace datasets

Hi. I am trying to convert corpora from HF to their IPA form with the following snippet. But I am getting really slow speeds.. only a couple of examples per second. Do you know how it can be sped up? …

MaveriQ updated 8 months ago
5
microsoft/Megatron-DeepSpeed #233

Links in dataset/download_books.sh are broken.

```bash # dataset/download_books.sh wget https://the-eye.eu/public/AI/pile_neox/data/BookCorpusDataset_text_document.bin wget https://the-eye.eu/public/AI/pile_neox/data/BookCorpusDataset_text_docu…

Zeyu-ZEYU updated 8 months ago
7
microsoft/DeepSpeedExamples #101

Which Wikipedia and BookCorpus datasets to use for Bert-pret…

I am trying to follow the example here https://www.deepspeed.ai/tutorials/bert-pretraining/ The section on getting the datasets says 'Note: Downloading and pre-processing instructions are coming…

SantoshGuptaML updated 3 years ago
1
staoxiao/RetroMAE #15

Some questions about pretrain data

Traceback (most recent call last): File "E:\RetroMAE-master\RetroMAE-master\examples\pretrain\preprocess.py", line 158, in wiki = create_wiki_data(args.tokenizer_name, args.max_seq_length, ar…

Victoriaheiheihei updated 1 year ago
2
google-research/bert #514

Is BERT a kind of cheating?

I know BERT has achieved SOTA in many NLP tasks, such as SQuAD and SWAG. But note that the data (both training and test) of SQuAD is from **Wikipedia**, and that of SWAG is from the **BookCorpus**, a…

erikchwang updated 3 years ago
13

上一页 1...1 2 3 4 5 6 7...22 下一页

215 results for bookcorpus

215 results
for bookcorpus