Closed CielAl closed 1 year ago
Hi @CielAl, There are some thoughts:
TileExtractionModule saves all tissue tiles as images under the tiles folder which is overkill since the final tiles folder size could be larger than the slide files. For example, I ran 2 slides (HTT-TILS-001-26B.svs - 316.4 MB and TCGA-EJ-5518-01Z-00-DX1.svs - 480.9 MB) as tests. The size of HTT-TILS-001-26B.svs's tiles folder was 841 MB and The size of HTT-TILS-001-26B.svs's tiles folder was 4.81 GB. Obviously, It wastes time and space to save each of the tiles from the slide file since there is a tiles_coords_nested.json file to describe the position of each tile. I suggest only saving the tile-description JSON files. The users can read the tiles_coords_nested.json file with a slide file to retrieve each tissue tile.
The tissue detection method still needs to improve since I can clearly see some misc-detected tissue tiles
Appreciate the feedback!!!
Hi @CielAl, There are some thoughts:
- TileExtractionModule saves all tissue tiles as images under the tiles folder which is overkill since the final tiles folder size could be larger than the slide files. For example, I ran 2 slides (HTT-TILS-001-26B.svs - 316.4 MB and TCGA-EJ-5518-01Z-00-DX1.svs - 480.9 MB) as tests. The size of HTT-TILS-001-26B.svs's tiles folder was 841 MB and The size of HTT-TILS-001-26B.svs's tiles folder was 4.81 GB. Obviously, It wastes time and space to save each of the tiles from the slide file since there is a tiles_coords_nested.json file to describe the position of each tile. I suggest only saving the tile-description JSON files. The users can read the tiles_coords_nested.json file with a slide file to retrieve each tissue tile.
Indeed! That's a primary reason the export-tile is optional and is controlled by the "save_image" bool flag. Perhaps for smaller slides (with fewer tissue regions) it may be feasible? Nonetheless, I can remove the explicit exportation of image tiles from the module since as you pointed out -- so long the users have the bboxes they can always retrieve the tiles using their own pipelines and save them into their preferred format (e.g., HDF5).
- The tissue detection method still needs to improve since I can clearly see some misc-detected tissue tiles
This is more concerning. So far the TileExtractionModule uses the 'img_mask_use' stored in base image for processing. Could you also put the corresponding binary mask here so we can identify whether it's the mask that goes wrong or the extraction code missed certain regions?
Appreciate the feedback!!!
Hi @CielAl, There are some thoughts:
- TileExtractionModule saves all tissue tiles as images under the tiles folder which is overkill since the final tiles folder size could be larger than the slide files. For example, I ran 2 slides (HTT-TILS-001-26B.svs - 316.4 MB and TCGA-EJ-5518-01Z-00-DX1.svs - 480.9 MB) as tests. The size of HTT-TILS-001-26B.svs's tiles folder was 841 MB and The size of HTT-TILS-001-26B.svs's tiles folder was 4.81 GB. Obviously, It wastes time and space to save each of the tiles from the slide file since there is a tiles_coords_nested.json file to describe the position of each tile. I suggest only saving the tile-description JSON files. The users can read the tiles_coords_nested.json file with a slide file to retrieve each tissue tile.
Indeed! That's a primary reason the export-tile is optional and is controlled by the "save_image" bool flag. Perhaps for smaller slides (with fewer tissue regions) it may be feasible? Nonetheless, I can remove the explicit exportation of image tiles from the module since as you pointed out -- so long the users have the bboxes they can always retrieve the tiles using their own pipelines and save them into their preferred format (e.g., HDF5).
Should set the default "save_image" bool flag as False?
- The tissue detection method still needs to improve since I can clearly see some misc-detected tissue tiles
This is more concerning. So far the TileExtractionModule uses the 'img_mask_use' stored in base image for processing. Could you also put the corresponding binary mask here so we can identify whether it's the mask that goes wrong or the extraction code missed certain regions? Thanks for your clarification. I checked the corresponding binary mask. It is correct.
Appreciate the feedback!!!
Hi @CielAl, There are some thoughts:
- TileExtractionModule saves all tissue tiles as images under the tiles folder which is overkill since the final tiles folder size could be larger than the slide files. For example, I ran 2 slides (HTT-TILS-001-26B.svs - 316.4 MB and TCGA-EJ-5518-01Z-00-DX1.svs - 480.9 MB) as tests. The size of HTT-TILS-001-26B.svs's tiles folder was 841 MB and The size of HTT-TILS-001-26B.svs's tiles folder was 4.81 GB. Obviously, It wastes time and space to save each of the tiles from the slide file since there is a tiles_coords_nested.json file to describe the position of each tile. I suggest only saving the tile-description JSON files. The users can read the tiles_coords_nested.json file with a slide file to retrieve each tissue tile.
Indeed! That's a primary reason the export-tile is optional and is controlled by the "save_image" bool flag. Perhaps for smaller slides (with fewer tissue regions) it may be feasible? Nonetheless, I can remove the explicit exportation of image tiles from the module since as you pointed out -- so long the users have the bboxes they can always retrieve the tiles using their own pipelines and save them into their preferred format (e.g., HDF5).
Should set the default "save_image" bool flag as False?
- The tissue detection method still needs to improve since I can clearly see some misc-detected tissue tiles
This is more concerning. So far the TileExtractionModule uses the 'img_mask_use' stored in base image for processing. Could you also put the corresponding binary mask here so we can identify whether it's the mask that goes wrong or the extraction code missed certain regions? Thanks for your clarification. I checked the corresponding binary mask. It is correct.
Thanks for the feedback. I've changed the default value of save_image to False.
Adding TileExtractionModule that locates each connected tissue region's tile bounding boxes, given the tile size and stride. Parameters:
Write:
tiles_coords_nested.json: List[List[Tuple[int, int, int, int]] -- Nested list of bounding boxes of each individual regions.
image thumbnail overlaid with bounding boxes.
Optional: individual tile outputs in [base_image_outdir]/tiles/ if "save_image" is set.
Pytest: adding the corresponding testing function in test_pipeline_cli.py:
Misc Fix encountered outside of the TileExtractionModule while running pytest: @choosehappy
Therefore, while a better solution is probably to define a protocol for all Modules, a simple fix, for now, is to sanitize and barricade invalid inputs in the BaseImage class.