结果复现问题 - Githubissues

junjianli106 commented 9 months ago

您好，按照您的流程，我进行了代码的复现，具体步骤如下： python create_patches_fp.py --source /homeb/junjianli/data/CAMELYON16/all_slide/40x --save_dir /homeb/junjianli/HAG-MIL-master/feature/camelyon_patches_256_level_2 --patch_size 256 --seg --patch --stitch --patch_level 2

python create_quadtree_wsis.py

CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir feature/camelyon_patches_256_level_2 --data_slide_dir data/CAMELYON16/all_slide/40x --csv_path feature/camelyon_patches_256_level_2/process_list_autogen.csv --feat_dir camelyon_patches_256_level_2 --batch_size 512 --slide_ext .tif

CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir /homeb/junjianli/HAG-MIL-master/feature/camelyon_patches_256_level_1_corresponding --data_slide_dir data/CAMELYON16/all_slide/40x --csv_path feature/camelyon_patches_256_level_2/process_list_autogen.csv --feat_dir feature/camelyon_patches_256_level_1_corresponding --batch_size 512 --slide_ext .tif

CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir feature/camelyon_patches_256_level_0_corresponding --data_slide_dir /homeb/junjianli/data/CAMELYON16/all_slide/40x --csv_path feature/camelyon_patches_256_level_2/process_list_autogen.csv --feat_dir feature/camelyon_patches_256_level_0_corresponding --batch_size 512 --slide_ext .tif

随机种子为6，其余参数都没有变。在3卡V100 （32G）下运行的实验。最后的结果如下： {'test_acc': 0.8604651093482971, 'test_auc': 0.8828520774841309, 'test_f1': 0.8434470891952515, 'test_loss': 0.21523647010326385}

请问，是我哪一步骤有问题吗？或者有那些需要注意的地方，谢谢

bravePinocchio commented 7 months ago

想知道为什么extract_features_fp.py中 if level == 1: step = patch_size

BearCleverProud commented 7 months ago

您好，按照您的流程，我进行了代码的复现，具体步骤如下： python create_patches_fp.py --source /homeb/junjianli/data/CAMELYON16/all_slide/40x --save_dir /homeb/junjianli/HAG-MIL-master/feature/camelyon_patches_256_level_2 --patch_size 256 --seg --patch --stitch --patch_level 2

python create_quadtree_wsis.py

CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir feature/camelyon_patches_256_level_2 --data_slide_dir data/CAMELYON16/all_slide/40x --csv_path feature/camelyon_patches_256_level_2/process_list_autogen.csv --feat_dir camelyon_patches_256_level_2 --batch_size 512 --slide_ext .tif

CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir /homeb/junjianli/HAG-MIL-master/feature/camelyon_patches_256_level_1_corresponding --data_slide_dir data/CAMELYON16/all_slide/40x --csv_path feature/camelyon_patches_256_level_2/process_list_autogen.csv --feat_dir feature/camelyon_patches_256_level_1_corresponding --batch_size 512 --slide_ext .tif

CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py --data_h5_dir feature/camelyon_patches_256_level_0_corresponding --data_slide_dir /homeb/junjianli/data/CAMELYON16/all_slide/40x --csv_path feature/camelyon_patches_256_level_2/process_list_autogen.csv --feat_dir feature/camelyon_patches_256_level_0_corresponding --batch_size 512 --slide_ext .tif

随机种子为6，其余参数都没有变。在3卡V100 （32G）下运行的实验。最后的结果如下： {'test_acc': 0.8604651093482971, 'test_auc': 0.8828520774841309, 'test_f1': 0.8434470891952515, 'test_loss': 0.21523647010326385}

请问，是我哪一步骤有问题吗？或者有那些需要注意的地方，谢谢

Hello, 以下是我能想到的一些可能需要注意的事项：

请确保您已从environment.yml文件中安装了环境，因为软件版本的差异可能导致结果出现差异。尤其请验证您的pixman库的版本，因为我之前遇到过类似的问题。您可以参考此链接获取更多信息：https://github.com/mahmoodlab/CLAM/issues/176
您可以查看一下您的Camelyon16数据集的来源。在我们的实验中，似乎BaiduNetDisk源比GigaDB源可靠一些，也许是他们后期做了更新。我们注意到一些GigaDB的WSI无法被CLAM处理，而BaiduNetDisk可以处理。数据来源的差异可能导致下游处理的差异。
考虑到您使用的是V100（32G），我们使用的是四块3090，您可以考虑增加每个分辨率中选择的patch数量，这可能会提高性能。

Sure, these are some of the things that I can come up with when I was doing the experiments:

Please ensure that you have installed the environment from the environment.yml file, as differences in software versions can lead to significant variations in results. Specifically, please verify the versions of your pixman library, as I have previously encountered this issue. You can refer to this link for more information: https://github.com/mahmoodlab/CLAM/issues/176.
It might be worth investigating the source of your Camelyon16 dataset. It seems unusual that the BaiduNetDisk source is more reliable than the GigaDB source. We have noticed that some WSIs from GigaDB cannot be processed by the CLAM source code, while BaiduNetDisk works fine. The differences in data sources could potentially result in discrepancies in downstream processes.
Considering that you are using a V100 (32G), while we are using 4 3090 GPUs, you might want to consider increasing the number of selected patches in each resolution, as this could potentially improve performance.

BearCleverProud commented 7 months ago

想知道为什么extract_features_fp.py中 if level == 1: step = patch_size

这里代码可能显得有一些混乱，因为输入的level等于1的时候，实际上是在处理基于level1的patch生成level0patch的情况。

而在level0下，step就应该等于patch size

The code here may appear a bit confusing because when the input level is equal to 1, it is actually handling the generation of level 0 patches based on level 1 patches.

In level 0, the step should be equal to the patch size.

BearCleverProud / HAG-MIL

结果复现问题 #1