Performance on custom medical data

MariaKokshaikina commented 2 years ago

Hello, thank you for this great repository. I used it to test DINO’s performance in detecting Pneumothorax in chest x-rays and segmenting tissues in histopathological images. Together with my colleague we tried different hyper parameters and configurations, however, in all our experiments activation maps showed no signs of recognising the disease.

We published the results of our experiments in this blog - https://medium.com/@mllabucu/transformer-based-self-supervised-learning-for-medical-images-41395d069829

The tested changes include:

Architectures of bit transformers were trained as well as regular CNN (resnext50)
Custom crop method to increase a chance of including region of interest in local crops
Changed various hyper parameters (number of crops, learning rate, warmup configuration)
Decreased dimensionality of last embedding layer since our data has less classes and is less diverse than ImageNet

We wonder if the reasons for poor performance are hidden solely in the nature of our dataset (described in the blog) or is there anything we could have missed when training DINO.

woctezuma commented 2 years ago

Maybe relevant:

In this work we conduct large-scale pre-training on large source datasets of either natural (ImageNet-21k/1k) or medical chest X-Ray images and compare full and few-shot transfer using different target datasets from both natural and medical imaging domains. Our observations provide evidence that while pre-training and transfer on closely related datasets do show clear benefit of increasing model and data size during pre-training, such benefits are not clearly visible when source and target datasets are further apart. These observations hold across both full and few-shot transfer and indicate that scaling laws pointing to improvement of generalization and transfer with increasing model and data size are incomplete and should be revised by taking into account the type and proximity of the source and target data, to correctly predict the effect of model and data scale during pre-training on transfer.

from:

Cherti, Mehdi, and Jenia Jitsev. "Effect of large-scale pre-training on full and few-shot transfer learning for natural and medical images." arXiv preprint arXiv:2106.00116 (2021). (code)

psteinb commented 2 years ago

Hi, great to see this post being circulated. :+1:

I also made experiments with DINO. In my case, I trained on material science images from a synchrotron light source.

Besides obstacles in making the code run and establishing stable training (not sure I understand why open PRs by the community are not being merged), I also made the observation that

some of the attention maps do exhibit a foreground-background segmentation which coincides with a classification to foreground or background class labels
the head which triggers on this appears to be of (pseudo-)random origin (for a fixed seed and single trained network); sometimes head2, sometimes head5 exposes a blocky pattern one could make out as the foreground/background mask. NB: I trained on foreground+background vs. background-only images.

Quoting from the paper's abstract:

first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets.

While the argument made here is true in a narrow sense from my point of view (the aforementioned semantic information might be somewhere in the trained net), I start to doubt this has any practical use or relevance with respect to trustworthiness of the predictions.

I'd appreciate more reports about what people do find using this approach or feedback on the statements above. Maybe it's the lack of inductive bias in my images? Or the size of the structures? :shrug:

facebookresearch / dino

Performance on custom medical data #123