DIAGNijmegen / pathology-hooknet

MIT License
51 stars 10 forks source link

Question about Pixel-based Sampling Strategy #3

Closed wuusn closed 3 years ago

wuusn commented 3 years ago

hi,

Thanks for your previous support. I was suprised at your work. It is my first time to see such a neat and well-arranged project.

However, I didn't find any implementation about your pixel-based sampling strategy, which will automatically balance the number of every class during training.

I tried to implement it by myself, but I didn't figure out how to let a lower amount of pixel accumulation have a higher chance of being sampled.

Do you have any suggestions?

Thanks

martvanrijthoven commented 3 years ago

Hi Yuxin Wu,

You are welcome and thank you very much for your kind words :).

It is true that the implementation of the pixel-based sampling strategy is not available in this repository. The batch generator, including the pixel-based sampling, is not opensource yet. I have to discuss with my supervisors if I can make this opensource and with what kind of license. Furthermore, I need to work on the documentation before I can release it. Sorry about this inconvenience.

However, I have made a notebook for you, in which I share the pixel-based sampling class that I use for the sampling strategy. In this notebook, a small example of how to use it is also shown. I hope you can use that code. You can find the notebook here:

https://github.com/DIAGNijmegen/pathology-hooknet/blob/master/notebooks/examples/pixelbasedsampling.ipynb

wuusn commented 3 years ago

hi martvanrijthoven,

Thank you so much for the example. I have learnt a lot from it.

I am curious about this line in your code:

    # update the pixel label controller with ground truth values of the sampled label
    pixellabelcontroller.update(example_ground_truth[label])

Here, you fetch one patch based on one label.

In my experience, one patch may contain different classes especially when it has a large FoV (in a large μm/px ). I wonder if in your real implementation, you labeled one patch based on the class with the max number of pixel of that patch, then did you fetch one patch randomly in a labeled patch pool based on the label you got from next(pixellabelcontroller)?

like:

label = next(pixellabelcontroller)
patch = batchgenerator.get_radom_patch_from_label(label)
pixellabelcontroller.update(patch)
def get_radom_patch_from_label(label):
     patches = all_patches[label]
     index = np.random.randint(0, len(patches))
     return patches[index]

Through your figure 6&7 in the paper, I have noticed that most of your illustrated ground truth patches have only one class, excpet the one in the second line of figure 7. Is this a nature characteristic of your annotated data?

I hope you can explain these to me.

Thanks

martvanrijthoven commented 3 years ago

Hi Yuxin Wu,

In the example of the notebook, I created patches with only single labels in them. This is indeed unrealistic. I just made that for the sake of the example. For training HookNet, I used patches that contained multiple labels. Due to the sparse annotations (i.e., not every pixel was annotated), high-resolution patches often contained pixels from only a single label. Hence the patches in the figures of the paper also contained mostly a single label. Patches with a large field-of-view (e.g., 8.0 um/px) contained often multiple labels.

However, label sampling was not based on patches. Instead, label sampling was based on annotations. The annotations were saved as polygons with the shapely library. When a new label was sampled, an annotation with the same label was sampled. Thereafter, a point was sampled within that annotation. That point was used as the center point of a patch, which we extracted from the WSI.

I hope that answers your question, otherwise please let me know.

wuusn commented 3 years ago

hi martvanrijthoven,

I got your point. Thanks for your comprehensive explanations.

I am really interested in the way you arrange your data, however, I still get confused in some point of the detail as I didn't find them both in code and paper. ( if I missed, plz notice me)

As you mentioned sparse annotations, Is there a condition that within one patch, it has unlabeled areas belong to one of your interested classes? If so, did you validated only the area within the Ground Truth? According to the figures, I see all the model results shown only within the Ground Truth region.

Thanks

martvanrijthoven commented 3 years ago

Hi Yuxin Wu,

In the paper, there is the 'Materials' section in which we explained how we collected and created the data. But I guess you already read that part and are still confused.

To answer your question: there could be unlabeled data in a patch belonging to one of the interested classes. However, the only way to be sure is to ask an expert/pathologist. There is no way to quantitatively validate the unlabeled areas and indeed we only validated in the ground truth regions. However, qualitative analysis for unlabeled areas is possible and we also did that. A couple of examples are shown in the last Figure of the paper which shows the result outside of the ground truth region for the entire WSIs.

Please let me know if anything is still unclear.

wuusn commented 3 years ago

hi, martvanrijthoven

Thanks for your comprehensive explanations. Your explanation really helped me understand your paper & idea better.

I will watch your repo and will be glad to see your updates. :)

Thanks again, i will close this issue.