Closed yidong72 closed 3 years ago
Please update the changelog in order to start CI tests.
View the gpuCI docs here.
I make sure all the unit tests run well and test all the notebooks. Everything is working now. It is ready to review.
Generic implementations are in the NodeTaskGraphMixin
. Also improved the usability of the dynamic port a bit.
I removed the "calcuated_ports_setup"
When I build the gquant container via docker/build.sh
and then run the nemo notebook 10_nemo_chatbot.ipynb
. I hit this error:
https://github.com/pytorch/pytorch/issues/43227
nn.utils.rnn.pack_padded_sequence: RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
I added another patch nemo_paddpatch.patch
to the Nemo installation in the build.sh script to fix it. Refer to my build.sh script.
When I build the gquant container via
docker/build.sh
and then run the nemo notebook10_nemo_chatbot.ipynb
. I hit this error: pytorch/pytorch#43227nn.utils.rnn.pack_padded_sequence: RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
I added another patch
nemo_paddpatch.patch
to the Nemo installation in the build.sh script to fix it. Refer to my build.sh script.
It is interesting NeMo is broken again. I will test it in another machine.
I removed the 'calucated_input_meta' method.
When I build the gquant container via
docker/build.sh
and then run the nemo notebook10_nemo_chatbot.ipynb
. I hit this error: pytorch/pytorch#43227nn.utils.rnn.pack_padded_sequence: RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
I added another patch
nemo_paddpatch.patch
to the Nemo installation in the build.sh script to fix it. Refer to my build.sh script. build.sh.txtIt is interesting NeMo is broken again. I will test it in another machine.
I tested. I reproduced the bug. I checked in your build.sh file
I'm getting an error in gquant due to NeMo.
File "/home/quant/NeMo/nemo/collections/nlp/metrics/squad_metrics.py", line 21, in <module>
from transformers.tokenization_bert import BasicTokenizer
ModuleNotFoundError: No module named 'transformers.tokenization_bert'
That transformers package changed in recent versions. I had to modify the requirements in the nemo repo (tag v0.11.1) to version of transformers "<=3.5.1". NeMo nlp breaks with transformers version beyond 3.5.1.
--- requirements/requirements_nlp.txt
+++ requirements_nlp_fix.txt
@@ -3,7 +3,7 @@
matplotlib
sentencepiece
torchtext
-transformers>=2.11.0
+transformers>=2.11.0,<=3.5.1
unidecode
youtokentome
numpy
EOF
I added the patch to the build.sh script. Please see attached. I simplified it so there's just one patch file generated to fixup nemo.
I addressed your comments. I didn't find any difference between your attached build.sh.txt vs the current one. Are you sure you upload the right one?
I addressed your comments. I didn't find any difference between your attached build.sh.txt vs the current one. Are you sure you upload the right one?
Yea, you should see the patch with requirements changes. I also just combined all the various patches for nemo. In the dockerfile command:
COPY nemo.patch /home/quant/NeMo/
RUN git apply nemo.patch && \
bash reinstall.sh
You'll see this new patch in function "gen_ nemo_patches"
diff --git a/requirements/requirements_nlp.txt b/requirements/requirements_nlp.txt
index 885adf3e..0e4e44e2 100644
--- a/requirements/requirements_nlp.txt
+++ b/requirements/requirements_nlp.txt
@@ -3,7 +3,7 @@ h5py
matplotlib
sentencepiece
torchtext
-transformers>=2.11.0
+transformers>=2.11.0,<=3.5.1
unidecode
youtokentome
numpy
If you open https://github.com/rapidsai/gQuant/files/5703972/build.sh.txt it's a bit different than current build script.
Still need to test it
This PR is to implement the issue #106. It has following features:
The core part of the gQuant only depends on dask now. It is light-weight to install. It is ready to start review. Need to fix the unit tests.