bookcorpus Search Results

213 results
for bookcorpus

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/transformers #9339

Arrow file is too large when saving vector data

I computed the sentence embedding of each sentence of bookcorpus data using bert base and saved them to disk. I used 20M sentences and the obtained arrow file is about 59GB while the original text fil…

weiwangorg updated 3 years ago
2
NVIDIA/DeepLearningExamples #844

[Bert/Pytorch] pretraining FileNotFoundError

I think I finished the step 5 /workspace/bert/data/create_datasets_from_start.sh in quick start guide. It took a whole day or so. Now I am trying to do bash scripts/run_pretraining.sh benchmark. …

jzhang82119 updated 3 years ago
4
aws-samples/deep-learning-models #30

Export to TFRecords Error

Errors occurs when running `preprocessor.py` . The data is bookcorpus. ``` $ python3 -m preprocess --dataset=bookcorpus --shards=2048 --processes=64 --cache_dir=/data/bert_train/bookcorpus --tfrecor…

yunzhongOvO updated 3 years ago
4
dmlc/gluon-nlp #1380

[doc] File is missing in the README.md

The `prepare_bookcorpus.py` file is missing on this README.md https://github.com/dmlc/gluon-nlp/tree/master/scripts/datasets/pretrain_corpus (It should have been renamed `prepared_gutenberg.py`, and…

TristonC updated 3 years ago
2
EleutherAI/the-pile #50

BookCorpus download not working

When I try to download the bookcorpus dataset, my connection keeps getting closed, and it eventually gives up: ``` Connecting to battle.shawwn.com (battle.shawwn.com)|2606:4700:3033::681b:80c6|:44…

aveni updated 4 years ago
1
huggingface/datasets #2276

concatenate_datasets loads all the data into memory

## Describe the bug When I try to concatenate 2 datasets (10GB each) , the entire data is loaded into memory instead of being written directly to disk. Interestingly, this happens when trying to s…

chbensch updated 3 years ago
7
huggingface/datasets #757

CUDA out of memory

In your dataset ,cuda run out of memory as long as the trainer begins: however, without changing any other element/parameter,just switch dataset to `LineByLineTextDataset`,everything becames OK.

li1117heex updated 3 years ago
8
huggingface/datasets #666

Does both 'bookcorpus' and 'wikipedia' belong to the same da…

wahab4114 updated 3 years ago
1
huggingface/datasets #492

nlp.Features does not distinguish between nullable and non-n…

Here's the code I'm trying to run: ```python dset_wikipedia = nlp.load_dataset("wikipedia", "20200501.en", split="train", cache_dir=args.cache_dir) dset_wikipedia.drop(columns=["title"]) dset_wi…

jarednielsen updated 4 years ago
7
richarddwang/electra_pytorch #2

Error in store_attr()

Tried running the pretrain.py script when I got this error ``` process id: 76202 {'device': 'cuda:0', 'base_run_name': 'vanilla', 'seed': 11081, 'adam_bias_correction': False, 'schedule': 'origin…

glakshmidhar updated 4 years ago
1

上一页 1...14 15 16 17 18 19 20...22 下一页

213 results for bookcorpus

213 results
for bookcorpus