Missing features/tiles problem

Ultimate-Storm commented 1 year ago

Problem: The exact problem is just because when Omar tried to use his new e2e pipeline to run the same targets but observed different results. And he found with his e2e he got more features after tesselation and normalization.

Approach: Going to use the old pipeline to work on a small dataset. And reconstruct the normalized tiles back to WSI. By comparing the results with Omar's e2e pipeline to check why something got rejected Code snippet related to reconstruct back to WSI: https://github.com/KatherLab/end2end-WSI-preprocessing/blob/d160a882d905e496b98a52f9cb75595b7c23040f/stainNorm_Macenko.py#L142

Checklist:

[x] Whether running twice the old pipeline output different/arbitrary results
[ ] Deposit reconstruction func to wanshi-utils
[x] What part of the code is going wrong, for now everywhere could go wrong

Ultimate-Storm commented 1 year ago

Reason for this issue: in the old preprocessing pipeline more tiles got rejected because of the brightness threshold. Follows up by the canny rejection which only rejects a small part of the remaining tiles. In e2e preprocessing part only canny rejection(edge < 2) takes action. Examples of extra rejected tiles from old pipeline: 1031866_(1031,38155) 1031866_(1546,30936) 1031866_(1546,39186) 1031866_(2062,37123) 1031866_(2062,39186) 1031866_(5156,35577) 1031866_(5156,39701) 1031866_(5671,28874) 1031866_(7734,22171) 1031866_(13921,1031) 1031866_(14437,1031) 1031866_(25264,4640) 1031866_(26296,0)

Canny slide from e2e:

reconstructed canny slide on tiles(raw) from old preprocessing pipeline:

Ultimate-Storm commented 1 year ago

Result: old preprocessing pipeline rejects 20% more tiles than e2e. Causing e2e to have 20% more feature vectors for each patient. Resulting in better performance in model training and prediction with e2e.

KatherLab / preprocessing-ng

Missing features/tiles problem #2