Closed wuyu-z closed 2 years ago
Hi @wuyu-z,
Yeah, it seems like that's the case.
The dataset argument is the same one used in the training script for the self-supervised model. The purpose of it is to keep track of the dataset used to train the SSL model, the script will place the tile vector representations H5 into a directory with the name of the dataset. E.g.:
TCGAFFPE_LUADLUSC_5x_60pc_250K
hdf5_NYUFFPE_LUADLUSC_5x_60pc_he_combined.h5
results/BarlowTwins_3/TCGAFFPE_LUADLUSC_5x_60pc_250K/h224_w224_n3_zdim128/hdf5_NYUFFPE_LUADLUSC_5x_60pc_he_combined.h5
In this case, the real_hdf5 file refers to an external cohort, but the self-supervised model was trained on a subsample of 250K tiles of WSI from TCGA.
It also uses this file to check the format of the images in the H5 file (height, width, # of channels) and instantiate the model.
If you just want to use the pre-trained model and find the tile vector representations, there's a workaround for this. You can take the real_hdf5 file and create a dataset with just the training H5 file, you should be able to provide it as an argument and run the projections. E.g.:
datasets/name_dataset_1/he/patches_h224_w224/hdf5_name_dataset_1_he_train.h5
name_dataset_1
Otherwise, you can find the dataset with the LUAD & LUSC 250K tiles here. You can set up the directory with this and run the tile vector representations with it.
I hope this helps. Thanks, Adal
Hello @AdalbertoCq, My intention is to find tile vector representations for a given single svs file. If I understand corrently, the real_hdf5 is the h5 file we get as output from DeepPath preprocess, and the checkpoint is weight we get from step 1 (I used your provided weights here). When I try to run the script run_representationspathology_projection.py, it gives following error:
/models/selfsupervised/BarlowTwins.py", line 85, in __init__
self.num_samples = data.training.images.shape[0]
AttributeError: 'NoneType' object has no attribute 'images'
This error esstially happens because incorrect setting of dataset argument when I chase through the code if I'm right.
Can you explain more about what is the argument dataset of run_representationspathology_projection.py? Which dataset it is refering? And more importantly, why do I need another dataset during vectorising tiled images, given I already have the image h5 file (real_hdf5) and a pre trained model (checkpoint .ckt file).
Thank you in advance