Open Awj2021 opened 2 months ago
Hey there,
thanks a lot for this feedback. Regarding your questions:
Thank you for your kind reply. Except for filtering the low-quality images, can we use some methods, like the detection method to crop those images with portraits, and unrelative info? Except for the images, some texts are also incomprehensible and unrelative.
Since after filtering, the results images are around 100k, it's only 15% of the original dataset size(768K). So I think it's maybe a better way to keep the Histopathology part in the image, e.g., detection & crop.
Yes, I absolutely think you are right. It makes a lot of sense to use segmentation+postproc or detection methods to crop the actual parthology image parts on top of our initial filtering approach.
I think the question if text and image are aligned should be reflected - at least in principle - by filtering using the CONCH scores. Definition of a suitable threshold is to be done, of course. Quality of text might interfere with this score, and could be done prior to this, given a suitable classifier is available.
BTW: Where does your estimate of 768k images come from? To the best of my understanding, the QUILT1M dataset consists of 1M descriptions, linked to some 650K images.
Best regards,
Marc
Thank you for pointing out my mistake. Sorry, I am confused about it, the 768K is the image-text pairs as the official website shows.
No worries - I was also confused that the "1M" dataset only contains around 650M images. ;-) Cheers.
Hi, this is an amazing work to deal with the bad iamges in such a large-scale dataset! I have some questions.
I really appreciate it if you could give some tips about my question.
Best.