BearCleverProud / HAG-MIL

Repository for Hierarchical Attention-Guided Multiple Instance Learning
15 stars 2 forks source link
computational-pathology medical-transformer multiple-instance-learning whole-slide-image-classification

Diagnose Like a Pathologist: Transformer-Enabled Hierarchical Attention-Guided Multiple Instance Learning for Whole Slide Image Classification (IJCAI-2023)

Conghao Xiong, Hao Chen, Joseph J.Y. Sung and Irwin King

ArXiv | IJCAI Link

Abstract: Multiple Instance Learning (MIL) and transformers are increasingly popular in histopathology Whole Slide Image (WSI) classification. However, unlike human pathologists who selectively observe specific regions of histopathology tissues under different magnifications, most methods do not incorporate multiple resolutions of the WSIs, hierarchically and attentively, thereby leading to a loss of focus on the WSIs and information from other resolutions. To resolve this issue, we propose a Hierarchical Attention-Guided Multiple Instance Learning framework to fully exploit the WSIs. This framework can dynamically and attentively discover the discriminative regions across multiple resolutions of the WSIs. Within this framework, an Integrated Attention Transformer is proposed to further enhance the performance of the transformer and obtain a more holistic WSI (bag) representation. This transformer consists of multiple Integrated Attention Modules, which is the combination of a transformer layer and an aggregation module that produces a bag representation based on every instance representation in that bag. The experimental results show that our method achieved state-of-the-art performances on multiple datasets, including Camelyon16, TCGA-RCC, TCGA-NSCLC, and an in-house IMGC dataset.




WSI Segmentation and Patching

Thanks to the great work of CLAM. In this step, we used codes of CLAM, and we copy the instructions here from their repository.

The first step focuses on segmenting the tissue and excluding any holes. The segmentation of specific slides can be adjusted by tuning the individual parameters (e.g. dilated vessels appearing as holes may be important for certain sarcomas.) The following example assumes that digitized whole slide image data in well known standard formats (.svs, .ndpi, .tiff etc.) are stored under a folder named DATA_DIRECTORY:

    ├── slide_1.svs
    ├── slide_2.svs
    └── ...

Fully Automated Run

First, pull the codes from CLAM using the following commands. We recommend not to clone the codes to the project folder.

git clone

Then run the patch creation scripts from CLAM as follows:

python --source DATA_DIRECTORY --save_dir RESULTS_DIRECTORY --patch_size 256 --seg --patch --stitch --patch_level 2

The above command will segment every slide in DATA_DIRECTORY using default parameters, extract all patches within the segmented tissue regions, create a stitched reconstruction for each slide using its extracted patches (optional) and generate the following folder structure at the specified RESULTS_DIRECTORY:

    ├── masks
            ├── slide_1.png
            ├── slide_2.png
            └── ...
    ├── patches
            ├── slide_1.h5
            ├── slide_2.h5
            └── ...
    ├── stitches
            ├── slide_1.png
            ├── slide_2.png
            └── ...
    └── process_list_autogen.csv

Then given the level 2 coordinations, we generate the patches that contain exactly the same content at higher resolutions (level 1 and level 0) using the provided file in this repo:


Note that the default setting is for Camelyon16 dataset, which starts at level 2 with patch size 256 x 256. If you need other patch sizes or starting at different levels, you may need to revise the

Feature Extraction from the Cropped Patches

This part is also based on the codes from CLAM. The content of this section is copied here.

Feature Extraction (GPU Example)

CUDA_VISIBLE_DEVICES=0,1 python --data_h5_dir DIR_TO_COORDS --data_slide_dir DATA_DIRECTORY --csv_path CSV_FILE_NAME --feat_dir FEATURES_DIRECTORY --batch_size 512 --slide_ext .svs

The above command expects the coordinates .h5 files to be stored under DIR_TO_COORDS and will use 2 GPUs (0 and 1) and a batch size of 512 to extract 1024-dim features from each tissue patch for each slide in the given csv list and produce the following folder structure:

    └── pt_files
            └── ...

where each .h5 file contains an array of extracted features along with their patch coordinates (note for faster training, a .pt file for each slide is also created for each slide, containing just the patch features). The csv file is expected to contain a list of slide filenames (without the filename extensions) to process (the easiest option is to take the csv file auto generated by the previous segmentation/patching step, and delete the filename extensions)

Note that in our work, we use multiple resolutions of the WSIs (the default is 2 for Camelyon16). The naming rule is that the feature folders should end with "_level_2", "_level_1_corresponding" and "_level_0_corresponding".


The data used for training, validation and testing are expected to be organized as follows:

        └── pt_files
                └── ...
        └── pt_files
                └── ...
        └── pt_files
                └── ...
    └── ...

Namely, each dataset is expected to be a subfolder (e.g. DATASET_1_DATA_DIR) under DATA_ROOT_DIR, and the features extracted for each slide in the dataset is stored as a .pt file sitting under the pt_files folder of this subfolder. Datasets are also expected to be prepared in a csv format containing at least 3 columns: case_id, slide_id, and 1 or more labels columns for the slide-level labels. Each case_id is a unique identifier for a patient, while the slide_id is a unique identifier for a slide that correspond to the name of an extracted feature .pt file. This is necessary because often one patient has multiple slides, which might also have different labels. When train/val/test splits are created, we also make sure that slides from the same patient do not go to different splits. The slide ids should be consistent with what was used during the feature extraction step.

Dataset objects used for actual training/validation/testing can be constructed using the Generic_MIL_Dataset Class (defined in datasets/

For training, look under datasets/

dataset = Generic_MIL_Dataset(csv_path = config_dict['data_arguments']['ground_truth_csv'],
                                data_dir = config_dict['data_arguments']['feature_dir'],
                                shuffle = config_dict['data_arguments']['shuffle_data'], 
                                seed = config_dict['hyperparams_arguments']['seed'], 
                                print_info = config_dict['data_arguments']['print_info'],
                                label_dict = config_dict['data_arguments']['label_dict'],
                                patient_strat = config_dict['data_arguments']['patient_strat'],
                                ignore = [])

The user would need to pass:

Training Splits

The split of Camelyon16 dataset is already given in the csvs/ folder. Our experiments are also based on this split. We train the model on the training dataset, validate on the validation dataset and test only once at the test dataset. For other datasets or you would like to create your own split, please follow the instructions from CLAM.

GPU Training on the Camelyon16 Dataset Using Our Default Settings

Run the following script:


However, this training code is only runnable when GPU memory is at least larger than 24G and by default the code uses four GPUs. If you are running on a machine that has less GPU memory, you might need to change the number of patches reserved for the next resolutions (t_patch and v_patch in the config file).

During training, the user can go into the results folder for the particular experiment, run:

tensorboard --logdir RESULT_DIR

This should open a browser window and show the logged training/validation statistics in real time.

Testing and Evaluation Script

User also has the option of using the evluation script to test the performances of trained models. A trained model for Camelyon16 can be accessed here. The model has achieved 0.9627 AUC, 0.9302 accuracy and 0.9250 F1 score on the official Camelyon16 test dataset.

Simply download it from the url and run the following command:

python3 --ckpt-path CKPT_PATH



The work described here was partially supported by grants from the National Key Research and Development Program of China (No. 2018AAA0100204) and from the Research Grants Council of the Hong Kong Special Administrative Region, China (CUHK 14222922, RGC GRF, No. 2151185). The results shown in this paper are based upon data generated by the TCGA Research Network.


If you find our work useful in your research or if you use parts of this code please consider citing our paper:

  title     = {Diagnose Like a Pathologist: Transformer-Enabled Hierarchical Attention-Guided Multiple Instance Learning for Whole Slide Image Classification},
  author    = {Xiong, Conghao and Chen, Hao and Sung, Joseph J.Y. and King, Irwin},
  booktitle = {Proceedings of the Thirty-Second International Joint Conference on
               Artificial Intelligence, {IJCAI-23}},
  publisher = {International Joint Conferences on Artificial Intelligence Organization},
  editor    = {Edith Elkind},
  pages     = {1587--1595},
  year      = {2023},
  month     = {8},
  note      = {Main Track},
  doi       = {10.24963/ijcai.2023/176},
  url       = {},