dataset - Githubissues

qiaoqiangPro commented 11 months ago

Hello, can you provide your dasetset for training?

Negin-Ghamsarian commented 11 months ago

Hello, can you provide your dasetset for training?

Hello! The following datasets are publicly available:

MRI: https://liuquande.github.io/SAML/ RETOUCH: https://retouch.grand-challenge.org/ CaDIS: https://www.sciencedirect.com/science/article/pii/S1361841521000992

The Cat101 segmentation dataset is a subset of Cataract-1K annotations (https://arxiv.org/abs/2312.06295) and will soon be made available.

qiaoqiangPro commented 11 months ago

Thank you very much for your prompt answer, I will go to complete the download of the data, if I encounter problems again want to ask you for advice, I hope you can give answers! Best wishes!

qiaoqiangPro commented 11 months ago

Hi, I downloaded RETOUCH-TrainingSet-Spectralis and RETOUCH-TrainingSet-Topcon, and I see that you need to convert to img images right, I'm currently raw and mhd, do you have some details on how to handle this? I'm having some difficulty reproducing it, maybe too many folders 1702547809419

qiaoqiangPro commented 11 months ago

Hi, I found that the RETOUCH dataset uses csv to indicate the data location, and you didn't find train_IDs_CSV and test_IDs_CSV in your csv samples, so maybe this way I don't know the division of your data right?

Negin-Ghamsarian commented 10 months ago

Hi, I downloaded RETOUCH-TrainingSet-Spectralis and RETOUCH-TrainingSet-Topcon, and I see that you need to convert to img images right, I'm currently raw and mhd, do you have some details on how to handle this? I'm having some difficulty reproducing it, maybe too many folders

Hello! The dataset creation codes for the RETOUCH dataset are added. You should use the code itk_reader.py and adapt the directories' paths for each device (Spectralis and Topcon). Afterward, using Fluid_Separation.py, you can exclude the IRF fluids.

Negin-Ghamsarian commented 10 months ago

Hi, I found that the RETOUCH dataset uses csv to indicate the data location, and you didn't find train_IDs_CSV and test_IDs_CSV in your csv samples, so maybe this way I don't know the division of your data right?

Hello! The TrainIDs used for training on the IRF fluids are under TrainIDs_RETOUCH_DA, which are uploaded.

Best regards,

qiaoqiangPro commented 10 months ago

Hi, I found that the RETOUCH dataset uses csv to indicate the data location, and you didn't find train_IDs_CSV and test_IDs_CSV in your csv samples, so maybe this way I don't know the division of your data right?

Hello! The TrainIDs used for training on the IRF fluids are under TrainIDs_RETOUCH_DA, which are uploaded.

Best regards,

Hi, Thanks a lot for your reply and addition, I have successfully run the code you added and got the TrainIDs_RETOUCH_DA folder. Your paper mentions OCT data labeled training images (from source domain), unlabeled training images (from target domain) and average number of test images per fold (391,569,115). In TrainIDs_RETOUCH_DA the files with suffix _0.csv _1.csv _2.csv _3.csv in each fold should be the labeled training images (from the source domain), the number of files I uploaded with your new uploads is the same, it's around (102+96+113+110)/4=105 png images, which is very far away from the 391 you mentioned. very far. Unlabeled training images (from the target domain): (572+454+338+221)/4=396, again a big difference from the 569 mentioned in the paper, maybe I'm misunderstanding it, I'm requesting your answer. Best wishes

Negin-Ghamsarian commented 10 months ago

Hi, I found that the RETOUCH dataset uses csv to indicate the data location, and you didn't find train_IDs_CSV and test_IDs_CSV in your csv samples, so maybe this way I don't know the division of your data right?

Hello! The TrainIDs used for training on the IRF fluids are under TrainIDs_RETOUCH_DA, which are uploaded. Best regards,

Hi, Thanks a lot for your reply and addition, I have successfully run the code you added and got the TrainIDs_RETOUCH_DA folder. Your paper mentions OCT data labeled training images (from source domain), unlabeled training images (from target domain) and average number of test images per fold (391,569,115). In TrainIDs_RETOUCH_DA the files with suffix _0.csv _1.csv _2.csv _3.csv in each fold should be the labeled training images (from the source domain), the number of files I uploaded with your new uploads is the same, it's around (102+96+113+110)/4=105 png images, which is very far away from the 391 you mentioned. very far. Unlabeled training images (from the target domain): (572+454+338+221)/4=396, again a big difference from the 569 mentioned in the paper, maybe I'm misunderstanding it, I'm requesting your answer. Best wishes

It seems you mistakenly considered "SpectralisVsTopcon" CSV files as supervised/semi/test sets, while "SpectralisVsTopcon4" CSV files correspond to the dataset used in this paper (as referenced in the config files under configs_RETOUCH_DA_scSENet_ST4). I will later remove these unnecessary files to avoid further misunderstanding.

qiaoqiangPro commented 10 months ago

Hi, I found that the RETOUCH dataset uses csv to indicate the data location, and you didn't find train_IDs_CSV and test_IDs_CSV in your csv samples, so maybe this way I don't know the division of your data right?

Hello! The TrainIDs used for training on the IRF fluids are under TrainIDs_RETOUCH_DA, which are uploaded. Best regards,

Hi, Thanks a lot for your reply and addition, I have successfully run the code you added and got the TrainIDs_RETOUCH_DA folder. Your paper mentions OCT data labeled training images (from source domain), unlabeled training images (from target domain) and average number of test images per fold (391,569,115). In TrainIDs_RETOUCH_DA the files with suffix _0.csv _1.csv _2.csv _3.csv in each fold should be the labeled training images (from the source domain), the number of files I uploaded with your new uploads is the same, it's around (102+96+113+110)/4=105 png images, which is very far away from the 391 you mentioned. very far. Unlabeled training images (from the target domain): (572+454+338+221)/4=396, again a big difference from the 569 mentioned in the paper, maybe I'm misunderstanding it, I'm requesting your answer. Best wishes

It seems you mistakenly considered "SpectralisVsTopcon" CSV files as supervised/semi/test sets, while "SpectralisVsTopcon4" CSV files correspond to the dataset used in this paper (as referenced in the config files under configs_RETOUCH_DA_scSENet_ST4). I will later remove these unnecessary files to avoid further misunderstanding.

Yes, sorry about that, there are too many files and I may have caused confusion, I followed your newly uploaded dataset_creation_RETOUCH folder settings and got the （all 16 files）

Your TrainIDs_RETOUCH_DA folder has, SpectralisVsCirrus, but you didn't consider the Cirrus data in your paper right, I can ignore it as well!

qiaoqiangPro commented 10 months ago

Hi, I found that the RETOUCH dataset uses csv to indicate the data location, and you didn't find train_IDs_CSV and test_IDs_CSV in your csv samples, so maybe this way I don't know the division of your data right?

Hello! The TrainIDs used for training on the IRF fluids are under TrainIDs_RETOUCH_DA, which are uploaded. Best regards,

Hi, Thanks a lot for your reply and addition, I have successfully run the code you added and got the TrainIDs_RETOUCH_DA folder. Your paper mentions OCT data labeled training images (from source domain), unlabeled training images (from target domain) and average number of test images per fold (391,569,115). In TrainIDs_RETOUCH_DA the files with suffix _0.csv _1.csv _2.csv _3.csv in each fold should be the labeled training images (from the source domain), the number of files I uploaded with your new uploads is the same, it's around (102+96+113+110)/4=105 png images, which is very far away from the 391 you mentioned. very far. Unlabeled training images (from the target domain): (572+454+338+221)/4=396, again a big difference from the 569 mentioned in the paper, maybe I'm misunderstanding it, I'm requesting your answer. Best wishes

It seems you mistakenly considered "SpectralisVsTopcon" CSV files as supervised/semi/test sets, while "SpectralisVsTopcon4" CSV files correspond to the dataset used in this paper (as referenced in the config files under configs_RETOUCH_DA_scSENet_ST4). I will later remove these unnecessary files to avoid further misunderstanding.

Hi, Negin. I apologize, I'm still confused at the moment, the first thing I'd like to confirm is whether the test set division of the 0th fold of Spectrails in RETOUCH is including these four goes?

I found that the code I followed to divide the dataset as you provided, in the processed result, firstly, the file name is SpectralisVsTopcon_0 which is not SpectralisVsTopcon4_0 as you corrected, and I can't be sure if they represent the same thing, and secondly, I found that file 026 exists in the SpectralisVsTopcon4_0.csv file (which is supposed to include only the training images of the source domains) that you provided, and I'm thinking that it shouldn't exist in the training set, and that it's already been partitioned to the SourceTest.

Best wishes.

Negin-Ghamsarian commented 10 months ago

followed to divide the dataset as you provide

The config files are already provided under configs_RETOUCH_DA_scSENet_ST4. The (dataset_creation_RETOUCH/subdataset_generator_RETOUCH_DA.py) code is a sample for dataset creation scheme in case you would like to create CSV files with other portions or strategies.

qiaoqiangPro commented 10 months ago

The config files are already provided under configs_RETOUCH_DA_scSENet_ST4. The (dataset_creation_RETOUCH/subdataset_generator_RETOUCH_DA.py) code is a sample for dataset creation scheme in case you would like to create CSV files with other portions or strategies.

Ok, Negin. i understand, thanks for the reply! Best wishes.

Negin-Ghamsarian / Transformation-Invariant-Self-Training-MICCAI2023

dataset #1