"gradient explosion" - Githubissues

DINGPENG-XIAOXIN commented 1 year ago

Dear Dr. Yaman,

Thank you for the SSDU toolbox. I had some problems training with your code. When I trained with kspace data of 280,15,320,320, the loss appeared nan after a dozen epoches. I wonder if there is a gradient explosion due to the large number of layers in the network. And if you have any good suggestions? Can I change the learning rate, network weight initialization, relu function and the number of network iterations for training and then as a comparison network?

Best Regards, PengDing

byaman14 commented 1 year ago

Hi,

We did not face the gradient explosion issue in our study. However, you can reduce your learning rate to see if it helps. It might also be related to data, i.e. preprocessing, normalizing, and padding the training mask if necessary. It might be easier to debug using our zero-shot self-supervised learning work (https://github.com/byaman14/ZS-SSL) as only a single slice from your dataset is needed to test things.

DINGPENG-XIAOXIN commented 1 year ago

Hi,

Thanks for the code and paper. I still have some small questions. In the self-supervised MRI reconstruction, there is no full sampling data, so how do we attenuate the learning rate, choose the plan of early stop and the optimal model?

Look forward to any replies from you! Best regards.

byaman14 commented 1 year ago

Hi, Acquired undersampled data is sufficient for validation. Given a dataset of undersampled measurements, generate two sub- dataset namely training and validation datasets. Training dataset will be used for self-supervised training(i.e. SSDU training). Validation dataset can be used to track the validation loss. In particular, for each sample in the validation set, split available measurements (denoted as \Omega in the paper) uniform randomly into training (\Theta) and loss (\Lambda) sets (same as SSDU partitioning). Then, compute and report the validation loss over only the loss set (\Lambda). By tracking validation loss you can ''attenuate the learning rate, choose the plan of early stop and the optimal model " accordingly.

DINGPENG-XIAOXIN commented 1 year ago

Hi, Thanks for your reply, I have a deeper understanding, I wish you every success in your work!

byaman14 commented 1 year ago

Thanks, I am closing this issue, feel free to reopen it if there are further questions.

byaman14 / SSDU

"gradient explosion" #4