FreyrS / dMaSIF

Other
191 stars 44 forks source link

Why "iface_preds " contain NAN when training dmasif_ #48

Open BingzeWu opened 10 months ago

BingzeWu commented 10 months ago

截屏2023-10-23 13 39 04 It seems that the training is not stable.
I follow the "benchmark_scripts" to retrain the dMaSIF_site_3layer_9A. But when calculate the roc-auc, it raised "Problem with computing roc-auc" and I found that the "iface_preds" contain NAN. Does anyone have similar problem?

YAndrewL commented 10 months ago

Same problem, did you solve this? @BingzeWu

YAndrewL commented 10 months ago

What do you mean by mini-batch? I've trained this with a batch size of 64, but the model only considers single-batch training, and NaN values still appear after several steps.

Bingze Wu @.***> 于2023年11月10日周五 14:14写道:

No, I found the problem may come from the data preprocess step. When I trained the model on a mini batch, the training was successful. So I found in dMasif convolution step, there is some problem for the “nuv” data. But I don’t how to fix the bug. @.***

发件人: Yufan Andrew Liu @.> 日期: 星期三, 2023年11月8日 23:59 收件人: FreyrS/dMaSIF @.> 抄送: Bingze WU 吴秉泽 @.>, Mention @.> 主题: Re: [FreyrS/dMaSIF] Why "ifacepreds " contain NAN when training dmasif (Issue #48) 你通常不会收到来自 @.*** 的电子邮件。了解这一点为什么很重要< https://aka.ms/LearnAboutSenderIdentification>

Same problem, did you solve this? @BingzeWuhttps://github.com/BingzeWu

― Reply to this email directly, view it on GitHub< https://github.com/FreyrS/dMaSIF/issues/48#issuecomment-1802185639>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/A2G24DEQKR5UEZBMBUARQ7TYDOT3LAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSGE4DKNRTHE>.

You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/FreyrS/dMaSIF/issues/48#issuecomment-1805533104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTEIWBOWVUVXJ7QA7TJFV3YDYEBBAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBVGUZTGMJQGQ . You are receiving this because you commented.Message ID: @.***>

-- Yufan Liu, Ph.D. student in computer science,

Computational Bioscience Research Center (CBRC),

King Abdullah University of Science and Technology (KAUST)

yandrewl.github.io

BingzeWu commented 10 months ago

Sorry, I mean I trained the model on a sub-dataset(randomly chosen, about 300 data points). And when training on the complete dataset, NaN values still appeared. If you use the trained model to compute the problem date point(where roc-auc problem was raised), you will find in specific geometric convolution layers the Nan value appeared. The internal computation seems to give the wrong value, but I don’t know how to fix it. The convolution relies on different geometric operations, which I am unfamiliar with. 发件人: Yufan Andrew Liu @.> 日期: 星期一, 2023年11月13日 19:32 收件人: FreyrS/dMaSIF @.> 抄送: Bingze WU 吴秉泽 @.>, Mention @.> 主题: Re: [FreyrS/dMaSIF] Why "ifacepreds " contain NAN when training dmasif (Issue #48) 你通常不会收到来自 @.*** 的电子邮件。了解这一点为什么很重要https://aka.ms/LearnAboutSenderIdentification What do you mean by mini-batch? I've trained this with a batch size of 64, but the model only considers single-batch training, and NaN values still appear after several steps.

Bingze Wu @.***> 于2023年11月10日周五 14:14写道:

No, I found the problem may come from the data preprocess step. When I trained the model on a mini batch, the training was successful. So I found in dMasif convolution step, there is some problem for the “nuv” data. But I don’t how to fix the bug. @.***

发件人: Yufan Andrew Liu @.> 日期: 星期三, 2023年11月8日 23:59 收件人: FreyrS/dMaSIF @.> 抄送: Bingze WU 吴秉泽 @.>, Mention @.> 主题: Re: [FreyrS/dMaSIF] Why "ifacepreds " contain NAN when training dmasif (Issue #48) 你通常不会收到来自 @.*** 的电子邮件。了解这一点为什么很重要< https://aka.ms/LearnAboutSenderIdentification>

Same problem, did you solve this? @BingzeWuhttps://github.com/BingzeWu

D Reply to this email directly, view it on GitHub< https://github.com/FreyrS/dMaSIF/issues/48#issuecomment-1802185639>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/A2G24DEQKR5UEZBMBUARQ7TYDOT3LAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSGE4DKNRTHE>.

You are receiving this because you were mentioned.Message ID: @.***>

― Reply to this email directly, view it on GitHub https://github.com/FreyrS/dMaSIF/issues/48#issuecomment-1805533104, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTEIWBOWVUVXJ7QA7TJFV3YDYEBBAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBVGUZTGMJQGQ . You are receiving this because you commented.Message ID: @.***>

-- Yufan Liu, Ph.D. student in computer science,

Computational Bioscience Research Center (CBRC),

King Abdullah University of Science and Technology (KAUST)

yandrewl.github.io

― Reply to this email directly, view it on GitHubhttps://github.com/FreyrS/dMaSIF/issues/48#issuecomment-1807984968, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2G24DFYQ67MUK7PNSRN7CDYEIANJAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBXHE4DIOJWHA. You are receiving this because you were mentioned.Message ID: @.***>

Xinheng-He commented 9 months ago

I also found that it has issues with input data, batch size and its hyperparameters. Struggled for 1 week for running without nan but fails. Maybe such network only works for their own PPI data which passed through prescision regulation. Given up for understanding and debug it.

orange2350 commented 9 months ago

@YAndrewL Same problem, did you solve this?

YAndrewL commented 9 months ago

Hi Zhiyi, not yet, but you may find the NaN in the input feature part, and mask then with average or some constant to start the training, unfortunately, I did not get the training results described in the paper.

Chen Zhiyi @.***> 于2023年12月6日周三 05:44写道:

@YAndrewL https://github.com/YAndrewL Same problem, did you solve this?

— Reply to this email directly, view it on GitHub https://github.com/FreyrS/dMaSIF/issues/48#issuecomment-1841995741, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTEIWAQUAMMBJH7ACXOSHLYH7LYFAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRHE4TKNZUGE . You are receiving this because you were mentioned.Message ID: @.***>

-- Yufan Liu, Ph.D. student in computer science,

Computational Bioscience Research Center (CBRC),

King Abdullah University of Science and Technology (KAUST)

yandrewl.github.io

orange2350 commented 9 months ago

Hi Zhiyi, not yet, but you may find the NaN in the input feature part, and mask then with average or some constant to start the training, unfortunately, I did not get the training results described in the paper. Chen Zhiyi @.> 于2023年12月6日周三 05:44写道: @YAndrewL https://github.com/YAndrewL Same problem, did you solve this? — Reply to this email directly, view it on GitHub <#48 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOTEIWAQUAMMBJH7ACXOSHLYH7LYFAVCNFSM6AAAAAA6LQCAYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNBRHE4TKNZUGE . You are receiving this because you were mentioned.Message ID: @.> -- Yufan Liu, Ph.D. student in computer science, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST) yandrewl.github.io

The results I get from reproducing the dMASIF is not the same as the paper used to evaluate it either.thanks