ZeyuGaoAi / SSMTL_CancerClassification

A Semi-Supervised Multi-Task Learning Framework for Cancer Classification with Weak Annotation in Whole-Slide Images - MedIA 2022
MIT License
11 stars 1 forks source link

下载使用您提供的数据集时所遇到的问题,希望能得到解答,非常感谢 #2

Closed YancyGuo closed 7 months ago

YancyGuo commented 1 year ago

按照README已经下载数据集并修改back_ground_filter.py内78~80行为对应路径,但在下一部使用指令时报错。 在复现过程中共遇到了4点问题

  1. 指令最初设定问题
    • epoches->epochs,
    • gpus->gpu 以上修改后可以传入
  2. 问题1修改后,执行指令会显示 Traceback (most recent call last): File "multitask_train.py", line 543, in <module> main() File "multitask_train.py", line 94, in main build_dataset(labeled_data_files, unlabeled_files, test_files) File "multitask_train.py", line 229, in build_dataset meta_data = PatchesDatasetSubtype(meta_data_files, transform=train_transform) File "/data/yancy/Conda/SSMTL/load_patches_data.py", line 124, in __init__ imgs.append((words[0], int(words[1]), int(words[2]))) IndexError: list index out of range 数据的存放形式如图 image 请问是需要对数据集进行预处理再进行训练吗?
    • 并且对原代码对应的数据集修改路径后,也会报错 (SSMTL) yancy@amax:/data/yancy/Conda/SSMTL$ python multitask_train.py --gpu 0,1 --epochs 200 --batch-size 128 --n-classes1 2 --n-classes2 3 --out /data/yancy/Conda/SSMTL/data/output ==> Preparing dataset /data/yancy/Conda/SSMTL/data/RCC/labeled_2000_train.txt /data/yancy/Conda/SSMTL/data/RCC/unlabeled_2000_train.txt /data/yancy/Conda/SSMTL/data/RCC/all_2000_test.txt ==> creating resynet 34 Total params: 34.40M Epoch: [1 | 200] LR: 0.002000 TrainingTraceback (most recent call last): File "multitask_train.py", line 264, in train inputs_x, targets_x, subtypes_x, _ = labeled_train_iter.next() File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in __next__ data = self._next_data() File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data return self._process_data(data) File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data data.reraise() File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise raise self.exc_type(msg) FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop data = fetcher.fetch(index) File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/yancy/Conda/SSMTL/load_patches_data.py", line 132, in __getitem__ img = Image.open(fn).convert('RGB') File "/data/yancy/anaconda3/envs/SSMTL/lib/python3.8/site-packages/PIL/Image.py", line 3092, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: '/home5/hby/subtype_newdata/RCC_point/CCRCC/TCGA-BP-4982-01Z-00-DX1.c2a2e876-e500-412f-b937-30a3824af803/p_label/cancer/82001_42001_2000.png'
  3. 假若对您提供的数据集需要进行预处理,是否方便提供修改运行相关的提示
  4. 对于提供的数据集中没有test文件,在multitask_train.py第80行中应当作何修改?
ZeyuGaoAi commented 1 year ago

For Q2, Please check the format of your "labeled_data_files". There are three columns (i.e., img_path, CRD label, and subtype label) with space as delimiters. Also, you should download svs_files from the TCGA database and generate image patches by yourself, that's why there is a "No such file or directory" error. For Q3, Pre-processing, please follow our notice in ReadMe. "Pre-processing Generate Binary Mask for WSIs ./preprocess/back_ground_filter.py. Extract Image Patches from WSIs (no-overlapping) ./preprocess/extract_patches.py." For Q4, you should generate the test files with our annotation.

Download the original SVS files from https://portal.gdc.cancer.gov/, and our annotation files from https://dataset.chenli.group/home/rcc-region-and-subtyping.