choosehappy / QuickAnnotator

An open-source digital pathology based rapid image annotation tool
BSD 3-Clause Clear License
74 stars 27 forks source link

Difficulty distinguishing artifact from objects of interest #29

Open AuritaP opened 2 years ago

AuritaP commented 2 years ago

I have scanned WSI using a Hamamatsu Nanozoomer S60 at 40x. Images have been divided into tiles 1000x1000 using openslider. Tissue is from mouse small intestine prepared using a Swiss roll technique, basically the intestine is roll like a cinnamon roll, sections prepared and stained with either A) PAS + Hematoxylin, B) AB + far red, or PAS + AB + Hematoxylin. I am using Quick annotator to see if I can train the model to identify Paneth and Goblet cells so I can count them and see which mice have more of them or not. The model seemed to have a harder time trying to identify the cells with the PAS staining since the stained color is like the background. So, I have been trying to concentrate on the AB staining, and it is a tad better, but we have staining artifacts that the model is picking up as positive, I have tried to keep telling the model that those artifacts are negative, even tried telling they were unknowns, but it keeps seeing them as positive. I have included two sample images, one in PAS and the other in AB staining and have indicated in black what is an artifact and using red what are the areas that I want to end up with. Any ideas? I have not played much at all with the slides that have both staining, thinking that the two colors may confuse the model more, but I may be wrong about that. AB+FR D-54584 SR5 - 2022-03-07 14 23 42_0_20000_27000 - Marked2 PAS D-54584 SR5 - 2022-03-07 17 59 17_0_9000_19000 Marked2

choosehappy commented 2 years ago

Thanks for reaching out

did you upload these at 40x to Quick Annotator? What do the associated masks look like?

Are you trying to segment both types of objects at the same time? e.g., both this image and this:

image

Given how different they are, you will likely want/need to train two different models

Just looking at this, i think you may need to downsize the images a bit so that there will be sufficient context available for the DL classifier to learn to highlight your objects of interest

Quick Annotator uses patch size of 256 for training, and as such you'll need to fit enough context in there for a discriminative signal to be present; we wrote a blog post on the topic here:

http://www.andrewjanowczyk.com/how-to-select-the-correct-magnification-and-patch-size-for-digital-pathology-projects/

Looking at your images, i would likely try something like 20x to start

also, the color variability is pretty large between these two examples, have you considered applying some stain normalization to improve the classifier performance?

I can recommend this toolbox: https://github.com/Peter554/StainTools

AuritaP commented 2 years ago

Thanks for looking into this. The images 1000x1000 were taken straight from openslider using their default choices, so I assume they are at 40x, since that is how they were scanned. And yes, I am trying to quantify both of those structures, so it makes sense to develop two different models. I will try to follow the steps that you outlined in that blog post and see how I can downsized the images to 20x without having to re-scanned them. Not sure I follow the comment about stain variability, they are two complete different stains, I mean each slide is stained with a different stain. But if you are referring to stain variability within each slide, I will look at the link that you recommend. I suspected that the model did not have enough info with the patches of 1000x1000 and I was about to try it with patches of 2000x2000, but given your comment above that will not be enough and I really need to downsized the images to 20x, I just have to figure out how to do that without having to re-scan them. They are a heck of a lot of slides and will have to paid again or them to be re-scan

AuritaP commented 2 years ago

One more thing, I just figure out how to extract tiles at different magnifications using openslide, is so damn simple, no idea how I did not figure that out before. Next I need to figure out how to tell openslide to only give me a section of the slide, I have three identical sections per slide and I only need to measure one. I did not care before because I was trying to figure out is this was going to work or not

choosehappy commented 2 years ago

A number of comments in hopes of helping you along on your path : )

I will try to follow the steps that you outlined in that blog post and see how I can downsized the images to 20x without having to re-scanned them.

You don’t have to overthink this, you can actually take your 40x images and simply resize them by a factor of .5 and you will end up with a early equivalent of 20x. any image processing toolbox or image tool (e.g., photoshop, ifran view, etc) will be able to do this for you without a problem

Not sure I follow the comment about stain variability, they are two complete different stains The way DL works is that you will get much better results faster if you use 1 model per stain type. Having multiple stains mixed together can cause a lot of confusion since the same objects can have wildly different appearances. It may work, but it is likely harder and requires more effort to stand up. The best practice is really to treat each stain separately, like you would see here for example:

https://pubmed.ncbi.nlm.nih.gov/32835732/

I mean each slide is stained with a different stain. But if you are referring to stain variability within each slide, I will look at the link that you recommend. I’m more worried about between slides than within slides, the intra-slide variability is probably okay

I suspected that the model did not have enough info with the patches of 1000x1000 and I was about to try it with patches of 2000x2000, but given your comment above that will not be enough and I really need to downsized the images to 20x, I just have to figure out how to do that without having to re-scan them. They are a heck of a lot of slides and will have to paid again or them to be re-scan

No need to rescan when going down in magnification : ) Note that the way DL works is that there is some locality constraints, i.e., typical CNN DL models cannot “see” the entire image at a time, and instead make decisions based on local information, in the case of this tool within a 256 x 256 pixel window, so anything outside of that window is not “visible” to the model when making a decision. As a result downsampling the image gets more contextual information within the 256 x 256 window, and thus allows for better performance (discussed extensively in the blogpost I mentioned)