Issue pre-processing TCGA slides

dddavid4real / HistGen

[MICCAI 2024] Official Repo of "HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction"

Apache License 2.0

42 stars 2 forks source link

Issue pre-processing TCGA slides #7

Closed BenPashley closed 3 months ago

BenPashley commented 4 months ago

Hi,

I am following your instructions using your implementation of CLAM to extract the patches and generate the h5 files. When I get to processing the larger slides, I get the following error.

Maximum support image dimension is 65500

How did you manage to overcome this limitation using your CLAM implementation? As per the other issue, I am unable to download all of your pt files efficiently due to the download limitation of Sharepoint (although keen to repeat your process end to end).

Any help and guidance would be much appreciated!

Ben

dddavid4real commented 4 months ago

Hi,

In the .sh script for patch segmenting, you could add this line of code at the start to mitigate this error:

export OPENCV_IO_MAX_IMAGE_PIXELS=10995116277760

Let me know if this works out.

For the downloading issue, we currently don't have a better solution either. So a possible way is to extract them using the original WSIs. And if you are using a personal PC, try to synchronize the files using Onedrive rather than downloading them directly.

BenPashley commented 4 months ago

Many thanks. I managed to resolve the issue by replacing your version of wsi_core/WholeSlideImage.py from the CLAM repo. That said, you mention that your version is accelerated. Is there anything else I should consider with not using your version?

dddavid4real commented 4 months ago

Hi, main acceleration is implemented in the extract_features_fp.py file (used for feature extraction). If any error occurs during feature extraction, you could put back our original wsi_core/WholeSlideImage.py to try mitigating the errors.

Feel free to report any unexpected problems!