YanCote / IFT6268-simclr

Project for IFT6268
0 stars 0 forks source link

Explore and propose different opensource datasets for medical imaging relevant for SSL. #4

Closed marued closed 3 years ago

marued commented 3 years ago

Different dataset exploration and reasearch results will be discussed in this issue.

Please list the good, the bad, the number and the type of images when listing a dataset.

sgaut023 commented 3 years ago

I have sent an email to Joseph Cohen. I'm asking for large medical datasets.

sgaut023 commented 3 years ago

List of different medical datasets on github: https://github.com/sfikas/medical-imaging-datasets Suggested by the TA: https://stanfordmlgroup.github.io/competitions/chexpert/ Suggested by Joseph Cohen: 1- https://github.com/mlmed/torchxrayvision 2- https://academictorrents.com/collection/medical

I'm currently looking at the suggestions.

sgaut023 commented 3 years ago

Potential Dataset 1: PatchCamelyon benchmark https://academictorrents.com/details/1561a180b11d4b746273b5ce46772ad36f1229b6 Size: 327,680 images Color Space: RGB Resolution: 96 x 96 pixels Downstream task: Binary Classification (presence of metastic tissue or not)

sgaut023 commented 3 years ago

Potential Dataset 2 : NIH Chest X-ray dataset Link: https://academictorrents.com/details/e615d3aebce373f1dc8bd9d11064da55bdadede0 Size: 112,120 frontal-view X-ray images of 30,805 unique patients Color Space: Grayscale Resolution: 1024 x 1024 pixels Downstream task: Classification of common thoracic pathologies include Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural_thickening, Cardiomegaly, Nodule, Mass and Hernia. Benchmark: https://arxiv.org/abs/1705.02315

Please note that each image can have a multi-labels. I'm wonder if the multi-task classification is too complicated.

sgaut023 commented 3 years ago

Potential Dataset 3: ISIC2018: Skin Lesion Analysis Towards Melanoma Detection Link: https://academictorrents.com/details/1e3811b66f1129a2b86b7c291316db8583dbc94f Size: 13,000 dermoscopic image Color Space: RGB Resolution: 600 x 450 images Downstream task: Classification of melanoma disease

YanCote commented 3 years ago

Liver Ultra-sound specific: link seems to be the same as https://www.kaggle.com/shanecandoit/dataset-of-bmode-fatty-liver-ultrasound-images

https://academictorrents.com/details/27772adef6f563a1ecc0ae19a528b956e6c803ce

https://competitions.codalab.org/competitions/15595

http://omar.alkadi.net/797-2/

https://www.cancerimagingarchive.net/

http://www.aylward.org/notes/open-access-medical-image-repositories https://chaos.grand-challenge.org/Data/ https://wiki.cancerimagingarchive.net/display/Public/TCGA-LIHC

sgaut023 commented 3 years ago

We have closed the task since we have decided to use the X-Ray dataset.