KatherLab / preprocessing-ng

MIT License
7 stars 4 forks source link

Missing features/tiles problem #2

Open Ultimate-Storm opened 1 year ago

Ultimate-Storm commented 1 year ago

Problem: The exact problem is just because when Omar tried to use his new e2e pipeline to run the same targets but observed different results. And he found with his e2e he got more features after tesselation and normalization.

Approach: Going to use the old pipeline to work on a small dataset. And reconstruct the normalized tiles back to WSI. By comparing the results with Omar's e2e pipeline to check why something got rejected Code snippet related to reconstruct back to WSI: https://github.com/KatherLab/end2end-WSI-preprocessing/blob/d160a882d905e496b98a52f9cb75595b7c23040f/stainNorm_Macenko.py#L142

Checklist:

Ultimate-Storm commented 1 year ago

Reason for this issue: in the old preprocessing pipeline more tiles got rejected because of the brightness threshold. Follows up by the canny rejection which only rejects a small part of the remaining tiles. In e2e preprocessing part only canny rejection(edge < 2) takes action. Examples of extra rejected tiles from old pipeline: 1031866_(1031,38155) 1031866_(1546,30936) 1031866_(1546,39186) 1031866_(2062,37123) 1031866_(2062,39186) 1031866_(5156,35577) 1031866_(5156,39701) 1031866_(5671,28874) 1031866_(7734,22171) 1031866_(13921,1031) 1031866_(14437,1031) 1031866_(25264,4640) 1031866_(26296,0)

Canny slide from e2e: image

reconstructed canny slide on tiles(raw) from old preprocessing pipeline: image

Ultimate-Storm commented 1 year ago

Result: old preprocessing pipeline rejects 20% more tiles than e2e. Causing e2e to have 20% more feature vectors for each patient. Resulting in better performance in model training and prediction with e2e.