Got NaN output when running BA-shapes.ipynb

haolanut commented 4 years ago

Hi

I attempt to run the code in "BA-shapes.ipynb" and got an error "ValueError: Input contains NaN, infinity or a value too large for dtype('float32')". Only "BA-shapes.ipynb" has this issue. The other three experiments are fine. I think this may be caused by the python environment. Could you please provide your environment for running experiments?

Here is my output of running "BA-shapes.ipynb" and "pip freeze": https://gist.github.com/haolanut/814d3908fd4a91fd37da22b0349269cc

Also, I'm wondering are you planning to provide the script for explaining the graph classification model?

Thank you!

flyingdoog commented 4 years ago

FYI absl-py==0.11.0 antlr4-python3-runtime==4.8 argon2-cffi==20.1.0 astor==0.8.1 astroid==2.4.2 astunparse==1.6.3 async-generator==1.10 attrs==20.2.0 backcall==0.2.0 bleach==3.2.1 blis==0.4.1 cachetools==4.1.1 catalogue==1.0.0 certifi==2020.6.20 cffi==1.14.3 chardet==3.0.4 click==7.1.2 colorama==0.4.4 contextlib2==0.6.0.post1 coverage==5.3 cycler==0.10.0 cymem==2.0.4 Cython==0.29.21 dataclasses==0.6 decorator==4.4.2 defusedxml==0.6.0 editdistance==0.5.3 entrypoints==0.3 fastBPE==0.1.0 filelock==3.0.12 future==0.18.2 gast==0.3.3 gensim==3.8.3 google-auth==1.22.1 google-auth-oauthlib==0.4.2 google-pasta==0.2.0 grpcio==1.33.2 h5py==2.10.0 hydra-core==1.0.3 hyperopt==0.1.2 idna==2.10 importlib-resources==3.3.0 ipykernel==5.3.4 ipython==7.18.1 ipython-genutils==0.2.0 ipywidgets==7.5.1 isort==5.6.4 jedi==0.17.2 Jinja2==2.11.2 joblib==0.17.0 json-tricks==3.15.3 jsonschema==3.2.0 jupyter==1.0.0 jupyter-client==6.1.7 jupyter-console==6.2.0 jupyter-core==4.6.3 jupyterlab-pygments==0.1.2 Keras-Preprocessing==1.1.2 kiwisolver==1.3.1 lazy-object-proxy==1.4.3 Markdown==3.3.3 MarkupSafe==1.1.1 matplotlib==3.3.2 mccabe==0.6.1 mistune==0.8.4 murmurhash==1.0.4 nbclient==0.5.1 nbconvert==6.0.7 nbformat==5.0.8 nest-asyncio==1.4.2 netifaces==0.10.9 networkx==2.5 nlpaug==1.0.1 nltk==3.5 nni==1.9 notebook==6.1.4 numpy==1.18.5 oauthlib==3.1.0 omegaconf==2.0.3 opt-einsum==3.3.0 packaging==20.4 pandas==1.1.4 pandocfilters==1.4.3 parso==0.7.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==8.0.1 pkginfo==1.6.1 plac==1.1.3 portalocker==2.0.0 preshed==3.0.4 prometheus-client==0.8.0 prompt-toolkit==3.0.8 protobuf==3.13.0 psutil==5.7.3 ptyprocess==0.6.0 pyarrow==2.0.0 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycparser==2.20 Pygments==2.7.2 pylint==2.6.0 pymongo==3.11.0 pyparsing==2.4.7 pyrsistent==0.17.3 python-dateutil==2.8.1 PythonWebHDFS==0.2.3 pytz==2020.4 PyYAML==5.3.1 pyzmq==19.0.2 qtconsole==4.7.7 QtPy==1.9.0 regex==2020.10.28 requests==2.24.0 requests-oauthlib==1.3.0 responses==0.12.0 rsa==4.6 ruamel.yaml==0.16.12 ruamel.yaml.clib==0.2.2 sacrebleu==1.4.14 sacremoses==0.0.43 schema==0.7.2 scikit-learn==0.23.2 scipy==1.5.3 seaborn==0.11.0 Send2Trash==1.5.0 sent2vec==0.2.0 sentencepiece==0.1.94 simplejson==3.17.2 six==1.15.0 skipthoughts==0.0.1 sklearn==0.0 smart-open==3.0.0 spacy==2.3.2 srsly==1.0.4 tensorboard==2.3.0 tensorboard-plugin-wit==1.7.0 tensorflow==2.3.1 tensorflow-estimator==2.3.0 termcolor==1.1.0 terminado==0.9.1 testpath==0.4.4 thinc==7.4.1 threadpoolctl==2.1.0 tokenizers==0.9.2 toml==0.10.1 torch==1.7.0 tornado==6.0.4 tqdm==4.51.0 traitlets==5.0.5 transformers==3.4.0 typing-extensions==3.7.4.3 urllib3==1.25.11 wasabi==0.8.0 wcwidth==0.2.5 webencodings==0.5.1 websockets==8.1 Werkzeug==1.0.1 widgetsnbextension==3.5.1 wrapt==1.12.1

flyingdoog commented 4 years ago

For the graph classification, I have released the code in code/forgraph. I followed the GNNExlainer code to generate the synthetic datasets. I will upload the usage example and pretrained models in the future.

haolanut commented 4 years ago

Thank you for your patient and reply. Looking forward to the examples and pre-trained models.

I tried to use exactly the same python environment to run the BA-shapes.ipynb. But I found that "ipython==7.18.1" is not compatible with python 3.6.8. It requires Python 3.7 or higher. So I upgraded python from 3.6.8 to 3.7.9.

However, the NaN output still exists. To debug the code, I printed the output of the model and explainer during the training. I found that both the model and explainer's output become NaN at the beginning of epoch 4. This is not supposed to happen since the training flag of the model is false. Do you have any clue about this issue?

Here is my new output of running "BA-shapes.ipynb": https://gist.github.com/haolanut/4889658a9462dfbaf696ec66300a15b1

Thank you!

flyingdoog commented 4 years ago

Thank you for your patient and reply. Looking forward to the examples and pre-trained models.

I tried to use exactly the same python environment to run the BA-shapes.ipynb. But I found that "ipython==7.18.1" is not compatible with python 3.6.8. It requires Python 3.7 or higher. So I upgraded python from 3.6.8 to 3.7.9.

However, the NaN output still exists. To debug the code, I printed the output of the model and explainer during the training. I found that both the model and explainer's output become NaN at the beginning of epoch 4. This is not supposed to happen since the training flag of the model is false. Do you have any clue about this issue?

Here is my new output of running "BA-shapes.ipynb": https://gist.github.com/haolanut/4889658a9462dfbaf696ec66300a15b1

Thank you!

Thanks for the information and I will figure it out. I checked my python version, is 3.8.6 now. Maybe you can try with this version first.

gui-li commented 4 years ago

Thank you for your patient and reply. Looking forward to the examples and pre-trained models. I tried to use exactly the same python environment to run the BA-shapes.ipynb. But I found that "ipython==7.18.1" is not compatible with python 3.6.8. It requires Python 3.7 or higher. So I upgraded python from 3.6.8 to 3.7.9. However, the NaN output still exists. To debug the code, I printed the output of the model and explainer during the training. I found that both the model and explainer's output become NaN at the beginning of epoch 4. This is not supposed to happen since the training flag of the model is false. Do you have any clue about this issue? Here is my new output of running "BA-shapes.ipynb": https://gist.github.com/haolanut/4889658a9462dfbaf696ec66300a15b1 Thank you!

I think this problem was caused by batch normalization. You may set bn=False in the configure and see whether this problem is solved. The output of GCN is stable and the explanation AUC is similar to the previous one.

I have set the argument bn to False in config, but it didn't change the NaN output.

gui-li commented 4 years ago

Thank you for your patient and reply. Looking forward to the examples and pre-trained models. I tried to use exactly the same python environment to run the BA-shapes.ipynb. But I found that "ipython==7.18.1" is not compatible with python 3.6.8. It requires Python 3.7 or higher. So I upgraded python from 3.6.8 to 3.7.9. However, the NaN output still exists. To debug the code, I printed the output of the model and explainer during the training. I found that both the model and explainer's output become NaN at the beginning of epoch 4. This is not supposed to happen since the training flag of the model is false. Do you have any clue about this issue? Here is my new output of running "BA-shapes.ipynb": https://gist.github.com/haolanut/4889658a9462dfbaf696ec66300a15b1 Thank you!

I think this problem was caused by batch normalization. You may set bn=False in the configure and see whether this problem is solved. The output of GCN is stable and the explanation AUC is similar to the previous one.

I have set the argument bn to False in config, but it didn't change the NaN output.

Did you retrain GCN after you set the bn to False?

I did

flyingdoog commented 4 years ago

Thank you for your patient and reply. Looking forward to the examples and pre-trained models.

I tried to use exactly the same python environment to run the BA-shapes.ipynb. But I found that "ipython==7.18.1" is not compatible with python 3.6.8. It requires Python 3.7 or higher. So I upgraded python from 3.6.8 to 3.7.9.

However, the NaN output still exists. To debug the code, I printed the output of the model and explainer during the training. I found that both the model and explainer's output become NaN at the beginning of epoch 4. This is not supposed to happen since the training flag of the model is false. Do you have any clue about this issue?

Here is my new output of running "BA-shapes.ipynb": https://gist.github.com/haolanut/4889658a9462dfbaf696ec66300a15b1

Thank you!

I have fixed the bug and updated the code. The bug was in Explainer.py, line 124. I use this line to scale mask values to (0.01,0.99) to avoid log0. However the previous fails when mask = 1.0. You can replace this line with

mask = self.mask(2scale-1.0)+(1.0-scale)

Thanks a lot for reporting the bug!

haolanut commented 4 years ago

Thank you for fixing the bug! I have tried the newest code, it works perfectly.

By the way, you can cast "tmp" variable to float32 before feeding into explainer, like this:

tmp = float(1.0*np.power(0.05,epoch/epochs))

Then, TensorFlow will not warn about casting input from float64 to float32.

flyingdoog commented 4 years ago

Thank you for fixing the bug! I have tried the newest code, it works perfectly.

By the way, you can cast "tmp" variable to float32 before feeding into explainer, like this:
tmp = float(1.0*np.power(0.05,epoch/epochs))
Then, TensorFlow will not warn about casting input from float64 to float32.

Thanks for the suggestion! I have added it in ipynb files.

flyingdoog / PGExplainer

Got NaN output when running BA-shapes.ipynb #2