Thanos-DB / FullyConvolutionalTransformer

Official implementation of The Fully Convolutional Transformer for Medical Image Segmentation
https://chaitanya-kaul.github.io/
110 stars 12 forks source link

low dice on test set #14

Open kingo233 opened 1 year ago

kingo233 commented 1 year ago

ACDC dataset now have two folders: training and testing

截屏2023-04-23 16 58 11

After using this split https://github.com/Thanos-DB/FullyConvolutionalTransformer/issues/6 to divide the training folder into 3 parts .I reproduced your result: 6caea5bec656e2bef48861d9fb573965 But when I tested the model on test folder which patient number from 101 to 150.the result is low: 6939c5735e0c6f0b708152373fc32a76

Did all the results on the ACDC leaderboard test on the testing folder or on the training folder, like in your paper? What's more,in my this repo https://github.com/kingo233/FCT-Pytorch. I trained using the training folder and get dice 90 on testing folder.But I can't get improvement any more...is this the upper limit of FCT?

Thanos-DB commented 1 year ago

Hi kingo233,

at the time of writing the paper and even after the paper got accepted there were no available ground truths for the test set. So, what we did was train on the set with available ground truths, predict using the official test set, generate the required files and upload them on the evaluation server. We have never tested the official test set on our own because that was not possible at the time. After submitting the required files the evaluation server would calculate everything and return some statistics (in our paper there is a link with those). So, at first we split the 100 patients as seen in the paper resulting in train, validation and test set with ground truths as other papers do. Then to show that we did not overfit on this test set we predict using the official test set (remember at the time there were no ground truths) and report those too. You can see in the paper that your 71.7% is too low. Maybe you mixed the two test sets somehow and this is why it is this low? I am sure the evaluation platform would not have ~20% difference. Concerning your PyTorch implementation did you change the multiheadattention as discussed in your previous issue? I suspect this is missing to get the remaining 2-3%.

BR Thanos

kingo233 commented 1 year ago

Hi! Thanos, I downloaded your model from this issue https://github.com/Thanos-DB/FullyConvolutionalTransformer/issues/2 instead of training model myself.Then I tested on the patient101-patietnt150(official test set),the dice is still so low.

截屏2023-04-24 08 58 41

here is where I modify the notebook:

截屏2023-04-24 08 58 22 截屏2023-04-24 08 59 11

What's more,the result link in your paper is not accessible

截屏2023-04-24 15 52 51

Can you download the official test set which contains ground truth and test locally ?

Thanos-DB commented 1 year ago

If you check the official webpage of the dataset, they state that "It remained open until the end of 2022 for new submissions.". I speculate that after it was shut down, the submission links became inactive. Our submissions of the results (on the evaluation server) and the paper (on WACV2023) were made months before, at which point the links were fully functional.

The next time I have some free time and available resources, I'll try to see what happens and get back to you.

Susu0812 commented 6 months ago

ACDC数据集现在有两个文件夹:training和testing截屏2023-04-23 16 58 11之后,使用这个拆分#6将训练文件夹分成3个部分。我复制了您的结果: 但是当我在患者编号从 101 到 150 的测试文件夹上测试模型时,结果很低: 6caea5bec656e2bef48861d9fb573965 6939c5735e0c6f0b708152373fc32a76

ACDC排行榜上的所有结果是否都在测试文件夹或培训文件夹上进行了测试,就像在你的论文中一样?更重要的是,在我的这个回购 https://github.com/kingo233/FCT-Pytorch。我使用训练文件夹进行训练,并在测试文件夹上获得骰子 90。但我再也无法得到改善了......这是FCT的上限吗?

我也是自己降低bitch_size后训练对ACDC做测试只得到了89-90的一个效果,没有达到论文中所说的效果。另外我还对Synapse做了训练测试,效果非常差,只有50-55,不知道是因为超参数设置问题还是什么原因,作者似乎并未提供Synapse的训练参数以及代码。