[REVIEW]gQuant plugin implementation

NVIDIA / fsi-samples

A collection of open-source GPU accelerated Python tools and examples for quantitative analyst tasks and leverages RAPIDS AI project, Numba, cuDF, and Dask.

271 stars 115 forks source link

[REVIEW]gQuant plugin implementation #112

Closed yidong72 closed 3 years ago

yidong72 commented 4 years ago

This PR is to implement the issue #106. It has following features:

Server validation move to the module which can be registered
Server clean up function move to the module which can be registered
Dynamic port handling both for server and client side
Cache and load move to the module
Client correct port type validation, it can handle the sub-class validation.
Object copy move to the module which can be registered

The core part of the gQuant only depends on dask now. It is light-weight to install. It is ready to start review. Need to fix the unit tests.

GPUtester commented 4 years ago

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

yidong72 commented 4 years ago

I make sure all the unit tests run well and test all the notebooks. Everything is working now. It is ready to review.

yidong72 commented 4 years ago

Generic implementations are in the NodeTaskGraphMixin. Also improved the usability of the dynamic port a bit.

yidong72 commented 4 years ago

I removed the "calcuated_ports_setup"

avolkov1 commented 3 years ago

When I build the gquant container via docker/build.sh and then run the nemo notebook 10_nemo_chatbot.ipynb. I hit this error: https://github.com/pytorch/pytorch/issues/43227

nn.utils.rnn.pack_padded_sequence: RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor

I added another patch nemo_paddpatch.patch to the Nemo installation in the build.sh script to fix it. Refer to my build.sh script.

build.sh.txt

yidong72 commented 3 years ago

When I build the gquant container via docker/build.sh and then run the nemo notebook 10_nemo_chatbot.ipynb. I hit this error: pytorch/pytorch#43227
nn.utils.rnn.pack_padded_sequence: RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
I added another patch nemo_paddpatch.patch to the Nemo installation in the build.sh script to fix it. Refer to my build.sh script.

build.sh.txt

It is interesting NeMo is broken again. I will test it in another machine.

yidong72 commented 3 years ago

I removed the 'calucated_input_meta' method.

yidong72 commented 3 years ago

When I build the gquant container via docker/build.sh and then run the nemo notebook 10_nemo_chatbot.ipynb. I hit this error: pytorch/pytorch#43227
nn.utils.rnn.pack_padded_sequence: RuntimeError: 'lengths' argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor
I added another patch nemo_paddpatch.patch to the Nemo installation in the build.sh script to fix it. Refer to my build.sh script. build.sh.txt
It is interesting NeMo is broken again. I will test it in another machine.

I tested. I reproduced the bug. I checked in your build.sh file

avolkov1 commented 3 years ago

I'm getting an error in gquant due to NeMo.

  File "/home/quant/NeMo/nemo/collections/nlp/metrics/squad_metrics.py", line 21, in <module>
    from transformers.tokenization_bert import BasicTokenizer
ModuleNotFoundError: No module named 'transformers.tokenization_bert'

That transformers package changed in recent versions. I had to modify the requirements in the nemo repo (tag v0.11.1) to version of transformers "<=3.5.1". NeMo nlp breaks with transformers version beyond 3.5.1.

--- requirements/requirements_nlp.txt
+++ requirements_nlp_fix.txt
@@ -3,7 +3,7 @@
 matplotlib
 sentencepiece
 torchtext
-transformers>=2.11.0
+transformers>=2.11.0,<=3.5.1
 unidecode
 youtokentome
 numpy
EOF

I added the patch to the build.sh script. Please see attached. I simplified it so there's just one patch file generated to fixup nemo.

build.sh.txt

yidong72 commented 3 years ago

I addressed your comments. I didn't find any difference between your attached build.sh.txt vs the current one. Are you sure you upload the right one?

avolkov1 commented 3 years ago

I addressed your comments. I didn't find any difference between your attached build.sh.txt vs the current one. Are you sure you upload the right one?

Yea, you should see the patch with requirements changes. I also just combined all the various patches for nemo. In the dockerfile command:

COPY nemo.patch /home/quant/NeMo/
RUN git apply nemo.patch && \
    bash reinstall.sh

You'll see this new patch in function "gen_ nemo_patches"

diff --git a/requirements/requirements_nlp.txt b/requirements/requirements_nlp.txt
index 885adf3e..0e4e44e2 100644
--- a/requirements/requirements_nlp.txt
+++ b/requirements/requirements_nlp.txt
@@ -3,7 +3,7 @@ h5py
 matplotlib
 sentencepiece
 torchtext
-transformers>=2.11.0
+transformers>=2.11.0,<=3.5.1
 unidecode
 youtokentome
 numpy

If you open https://github.com/rapidsai/gQuant/files/5703972/build.sh.txt it's a bit different than current build script.

yidong72 commented 3 years ago

Still need to test it