Question Regarding Hierarchical Pretraining

ptoyip commented 1 year ago

Thanks for the code!

I got a question about the Hierarchical Pretraining. Based on my understanding of the Hierarchical Pretraining pipeline, I should:

run CLAM to get 256*256 images and use them to train ViT256-16 using DINO
run CLAM to get 40964096 images and use the pretrained ViT256-16 to make it 256384 tensors (Each 40964096 images are crop into 256256 and pass through ViT256-16, then stack the output to get slide_1_1.pt in the HIPT repo)
train ViT4096-256 using the weight in step 2
HIPT training using both pretrain ViT (256 and 4K)

I saw the readme saying we should get the step 2 weights by:

.../path/to/region_4096_pretraining/: directory of pre-extracted region-level local features for each [4096 × 4096] region across all WSIs using python3 pre-train/extract_features.py.

But when I look through the extract_features.py, I found the corresponding file requires pretrain_vit_region path (Which in my understanding, is not necessary, and also that should be the outcome of the Hierarchical Pretraining), do I understand the training pipeline wrong? Or I just looking for the wrong code?

Thanks in advance !

clemsgrs commented 1 year ago

hi, no problem!

Indeed, this is the expected hierarchical pretraining pipeline (though I'd use the word "features" instead of "weight" when describing the [256, 384] tensors).

One note is that you shouldn't stack the [256, 384]-dim features when pretraining the intermediate Transformer block (ViT_4096-256) given this block operates at the region level (and not at the slide level).

You are right that pretrain_vit_region is not necessary when extracting [256, 384]-dim features (which I like to call "local" features). If you look closer at the extract_features.py code, pretrain_vit_region is only used when extracting "global" features:

https://github.com/clemsgrs/hipt/blob/6b46b569fc95723abf5e957c804ce5ce7174d7b9/extract_features.py#L53-L60

Hence, you don't need to provide it when extracting "local" features:

https://github.com/clemsgrs/hipt/blob/6b46b569fc95723abf5e957c804ce5ce7174d7b9/extract_features.py#L61-L66

Let me know if this answers your question :)

ptoyip commented 1 year ago

Thanks for the reply! So if I understand correctly, if I want to train ViT4096_256, what I need to do is go to config/feature_extraction.yaml and change the level to 'global' and correct the pretrain_vit_patch to my ViT256_16 path?

clemsgrs commented 1 year ago

if you want to pretrain ViT-4096_256, you'll need to:

extract local features for each (4096, 4096) region in your dataset using extract_features.py script and the following config file:

region_dir: 'path/to/extracted/regions/'

output_dir: 'output'
experiment_name: '4096_local'
resume: False

slide_list:

region_size: 4096
patch_size: 256
mini_patch_size: 16

format: 'jpg'
level: 'local'
save_region_features: True

pretrain_vit_patch: 'path/to/pretrained/vit_256_16.pth'
pretrain_vit_region: ''

wandb:
  enable: False

run pretrain/dino_region.py (making sure the data_dir argument in your pretraining config file point towards the output of the previous step, which could resemble something like output/4096_local/global/<date_or_wandb_id>/)

ptoyip commented 1 year ago

Thanks for the help! It works

clemsgrs / hipt

Question Regarding Hierarchical Pretraining #9