Confusion about training time and test dataset (brain tumor)

Julian-Wyatt / AnoDDPM

CVPR Workshop paper - AnoDDPM: Anomaly Detection with Denoising Diffusion Probabilistic Models using Simplex Noise

https://julianwyatt.co.uk/anoddpm

MIT License

174 stars 28 forks source link

Confusion about training time and test dataset (brain tumor) #9

Closed lianjizhe closed 1 year ago

lianjizhe commented 2 years ago

Thank you for your work (AnoDDPM), the work is very great. I have some confusion about the work.

The epoch set in your parameters is 3000. How long did it take you to run these 3000 epochs with a single NVIDIA Titan Xp GPU with 12GB GDDR5 in your environment.
A neuroimaging dataset of brain tube patients (your test datasets) cannot be downloaded. Is it convenient for you to share the dataset?

Looking forward to your reply.

Julian-Wyatt commented 1 year ago

Thank you for your interest in my work. I'm glad to hear that you found it helpful.

Regarding the time taken to run the model, I did not pay close attention to the exact time but rather let it run for approximately 24 hours. While the model did overfit in this case, this is less of an issue in this particular domain. Based on my experience, I would expect that running the model for 500-1000 epochs would take 5-10 hours and would still yield useful results.

How come you cannot download the test dataset? I believe it is located at: https://reshare.ukdataservice.ac.uk/851861/?

Please let me know if there is anything else I can help you with!

lianjizhe commented 1 year ago

Thank you very much for your reply, your reply is very helpful. I succeeded in training later, and it takes about a day. A neuroimaging dataset of brain tube patients (your test datasets) cannot be downloaded indeed, and the error shown in the figure below occurs.

And I still have some confusion about the work: The train dataset: did you randomly extract a slice from each case 3D data and use it for training? That is, 100 patients correspond to 100 2D slices. The training set consists of 100 images in total. The test dataset: did you sample 4 images per patient by np.linspace？

Looking forward to your reply.

Julian-Wyatt commented 1 year ago

I'm sorry but I cannot redistribute the dataset based on its terms of service. I also no longer have access to the original website as I don't have my old university logon. I would suggest reaching out to one of the authors from that data collection for permission.

So for training (from memory), I sampled one out of the 100 training volumes and then for that volume, I sampled a slice of the brain from that (the specific slice range is In the dataset code). The test dataset, yes you are correct; I sampled 4 evenly spaced slices from the volume that contained the tumour using np.linspace. I looked through each volume and marked down the slice range of each tumour so that I could do that - the dictionary for which is located in that same dataset code

Hope this helps - let me know if there is anything further I can help with too

lianjizhe commented 1 year ago

Thank you for your reply. My confusion has been answered. Wish everything goes well with your work.

Chopper-233 commented 1 year ago

Excuse me, what is the size of the data set for training your model (the total amount after data enhancement)

Alchemistyui commented 1 year ago

Thank you very much for your reply, your reply is very helpful. I succeeded in training later, and it takes about a day. A neuroimaging dataset of brain tube patients (your test datasets) cannot be downloaded indeed, and the error shown in the figure below occurs.

And I still have some confusion about the work: The train dataset: did you randomly extract a slice from each case 3D data and use it for training? That is, 100 patients correspond to 100 2D slices. The training set consists of 100 images in total. The test dataset: did you sample 4 images per patient by np.linspace？

Looking forward to your reply.

Hi, I also face the same problem that the test brain tube patients datasets cannot be downloaded, do you solve this problem? Thanks~

Julian-Wyatt commented 1 year ago

Excuse me, what is the size of the data set for training your model (the total amount after data enhancement)

Apologies I didn't get to this sooner. The approximate size is listed in my paper - it should be in the region of 100 volumes of 256x256x60 (60 relevant brain slices, the rest are discarded) - with additional up to +-3 degree's rotation and random translation of 0.02×width and 0.09×height.

As some of these are continuous ranges, it wouldn't be entirely possible to give a clear, accurate discrete count of the number of training images.

Julian-Wyatt commented 1 year ago

Thank you very much for your reply, your reply is very helpful. I succeeded in training later, and it takes about a day. A neuroimaging dataset of brain tube patients (your test datasets) cannot be downloaded indeed, and the error shown in the figure below occurs. And I still have some confusion about the work: The train dataset: did you randomly extract a slice from each case 3D data and use it for training? That is, 100 patients correspond to 100 2D slices. The training set consists of 100 images in total. The test dataset: did you sample 4 images per patient by np.linspace？ Looking forward to your reply.

Hi, I also face the same problem that the test brain tube patients datasets cannot be downloaded, do you solve this problem? Thanks~

Sadly I can't distribute the data in this case - I would recommend considering a different open source labelled anomalous dataset.