Closed naveen-marthala closed 4 years ago
What version of gluonnlp and mxnet are you using?
I managed to run the code with gluonnlp 0.91 and MXNet 1.6.0.
Also, Gluonnlp will automatically save the models so you won't have to download it again next time.
I have installed packages with the exact versions you mentioned (I understood that you missed a dot since I got an error that there's no 0.91
version of gluonnlp
available, so i installed 0.9.1
):
!pip install mxnet==1.6.0 mxboard leven gluonnlp==0.9.1 tqdm sacremoses
and here's my session crashing again.
and here are my machine's specifications:
Sir, you may be using a more powerful system @jonomon. How do i fix this?
Or @jonomon , is there any other way I can download that dataset manually and include it in the code?
@ThomasDelteil
I have downloaded those files from the urls that are shown when those lines are being run. And did this:
ctx_nlp = mx.gpu(3)
# language_model, vocab = nlp.model.big_rnn_lm_2048_512(dataset_name='gbw', pretrained=True, ctx=ctx_nlp)
language_model = '/content/drive/.../big_rnn_lm_2048_512_gbw-6bb3e991.zip'
vocab = '/content/drive/.../gbw-ebb1a287.zip'
moses_tokenizer = nlp.data.SacreMosesTokenizer()
moses_detokenizer = nlp.data.SacreMosesDetokenizer()
I don't have any problems with these lines now.
I now have a new problem with generator
that uses language_model
and vocab
from previous lines. This is the error I now have. Is this error from the file? I couldn't figure out what actually caused this error.
@jonomon and @ThomasDelteil , will you please be able to look at my last reply and help me fix it.
What tokenizer are you using?
Could you provide a Minimal, Reproducible Example?
I have made no changes in "0.handwriting_ocr.ipynb" except in this line below, since my machine was crashing whenever i run it.
ctx_nlp = mx.gpu(3) # language_model, vocab = nlp.model.big_rnn_lm_2048_512(dataset_name='gbw', pretrained=True, ctx=ctx_nlp) # pointing to the downloaded files language_model = '/content/drive/.../big_rnn_lm_2048_512_gbw-6bb3e991.zip' vocab = '/content/drive/.../gbw-ebb1a287.zip' moses_tokenizer = nlp.data.SacreMosesTokenizer() moses_detokenizer = nlp.data.SacreMosesDetokenizer()
The tokenizer I am using is the one that was in the code.
Hey Naveen,
Hey Naveen,
- Question: Did you check the count of GPUs? My system only has 1 GPU so I had to put ctx_nlp = mx.gpu(mx.context.num_gpus()-1)
I am using Google Colab with TPU.
- Question: When you just put mx.context.num_gpus() into a Jupyter cell, what is the output? I had problems with the mxnet cuda version and this lead to the system trying to use the CPU instead of GPU.
The output is 0. But, I do have TPU though.
- Did you shutdown all other notebook kernels before starting this one?
Yes, this is the only notebook I have open.
- Is your SWAP of the Linux system big enough in case the RAM is getting full (my system needed 38 GB of total RAM to compute it ) --> I enlarged the SWAP to 32 GB so i have a total of 64 GB in theory.
I only have 35.35GB of RAM. I couldn't understand rest of your question.
BR Jan
I have downloaded those files from the urls that are shown when those lines are being run. And did this:
ctx_nlp = mx.gpu(3) # language_model, vocab = nlp.model.big_rnn_lm_2048_512(dataset_name='gbw', pretrained=True, ctx=ctx_nlp) language_model = '/content/drive/.../big_rnn_lm_2048_512_gbw-6bb3e991.zip' vocab = '/content/drive/.../gbw-ebb1a287.zip' moses_tokenizer = nlp.data.SacreMosesTokenizer() moses_detokenizer = nlp.data.SacreMosesDetokenizer()
I don't have any problems with these lines now.
bro if u have download the dataset can u pls send link of that dataset
Below code in 'Denoising text ouptut' section of '0_handwriting_ocr.ipynb' file, when run on a machine with 35.35GB RAM and 107.77GB disk space(google colab TPU Session) crashes system for unknown reason.
How do i download this dataset without crashing the machine? And also, I don't want to download it from next time, so can I save this dataset too?