both train and test loss are nan

clatfd / GNN-ART-LABEL

Code and dataset for the MICCAI 2020 paper: Automated Intracranial Artery Labeling using a Graph Neural Network and Hierarchical Refinement

Apache License 2.0

27 stars 4 forks source link

both train and test loss are nan #3

Open WoojunePark opened 4 years ago

WoojunePark commented 4 years ago

Maybe a silly mistake of mine, but both train and test loss are shown 'nan'.

Fraction nodes correct and fraction edges graphs are shown like this : Figure_2

Also, here's one of printed texts :

Model saved in path: ArtLabel1/model21920-nan-0.8798-0.1954.ckpt 22253, T 14594.9, Ltr nan, Lge nan, Ctr 0.5287, Str 0.0000, Cge 0.5397, Sge 0.0000 CtrN 0.8668, StrN 0.0000, CgeN 0.8798, SgeN 0.0000 CtrE 0.1853, StrE 0.0000, CgeE 0.1954, SgeE 0.0000 Val loss decrease. Model saved in path: ArtLabel1/model22253-nan-0.8798-0.1954.ckpt

I'm using the exact same notebook file. Even translated to .py returns same results.

It seems there's a problem in calculating loss, thus results in nan in both loss calculation. How can I solve this error??

zqaz999 commented 2 years ago

I also encountered the same problem, have you solved this problem?

clatfd commented 2 years ago

Thanks for your interest. It is very hard for me to reproduce your error. I suspect this is caused by package incompatibility. Please try again using the following codes, which I just tested and had no problem. I am runing on Windows Anaconda platform. conda create --name gnn activate gnn conda install tensorflow-gpu==1.15.0 nb_conda jupyter pip install graph_nets matplotlib scipy "tensorflow>=1.15,<2" "dm-sonnet<2" "tensorflow_probability<0.9" run the notebook

zqaz999 commented 2 years ago

First of all thank you for your reply, I have solved this problem . Futhermore I want to know whether these five datasets have contained the UNC dataset，i don't know much about these. Thanks again.

clatfd commented 2 years ago

@zqaz999 Great to hear you have solved the problem. If it is convenient, could you share more details about what is wrong and how you solved it? Those five datasets are completely different from the UNC dataset. If you want to have access to UNC data, please refer to their website: https://data.kitware.com/#collection/591086ee8d777f16d01e0724/folder/58a372e38d777f0721a64dc6

zqaz999 commented 2 years ago

Ok,thanks for your reply.My problem is mainly due to the problem of the environment configuration. I made a stupid mistake.At the beginning I tried to run this project on my laptop, but in fact it does not support GPU acceleration, so tensorflow may use the cpu version by default.So I used a computer that supports GPU acceleration and the problem was resolved.