BIMCV-CSUSP / BIMCV-COVID-19

Valencia Region Image Bank (BIMCV) that combines data from the PadChest dataset with future datasets based on COVID-19 pathology to provide the open scientific community with data of clinical-scientific value that helps early detection of COVID-19
MIT License
124 stars 34 forks source link

Dataset "usability" for AI #29

Open stbnps opened 4 years ago

stbnps commented 4 years ago

I performed the following experiment

Achieving the following results

Specificity:

Sensibility:

The issue

The network seems to perform very well on dataset [3], where each image was manually reviewed by radiologists [4]. However it performs significantly worse on dataset [1], where most labels were extracted using NLP and the images were not reviewed (even leading to the inclusion of completely white, or completely black images [5]).

Do you think the quality of the images and annotations may be a limiting factor for the performance of the network?

References

[1] http://ceib.bioinfo.cipf.es/covid19/resized_padchest_neumo.tar.gz [2] https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia [3] https://www.kaggle.com/c/rsna-pneumonia-detection-challenge [4] https://www.kaggle.com/c/rsna-pneumonia-detection-challenge/overview/acknowledgements [5] https://github.com/BIMCV-CSUSP/BIMCV-COVID-19/tree/master/padchest-covid#iti---proposal-for-datasets

rahools commented 4 years ago

Images that seem to be white or black have data in them. Just normalize[0 - 1], multiply it by 255, and plot it or save it.

samils7 commented 4 years ago

Images that seem to be white or black have data in them. Just normalize[0 - 1], multiply it by 255, and plot it or save it.

This comment is the answer to Q1 in: BIMCV-COVID19+/FAQ.md

stbnps commented 4 years ago

@rahools That's not true. Take a look at image 216840111366964013590140476722013038132133659_02-059-019.png: 216840111366964013590140476722013038132133659_02-059-019

You can see a white line. That white line means that the image is already scaled.

@samils7 That FAQ is for BIMCV-COVID19+, not for padchest-covid

rahools commented 4 years ago

my bad, I successfully applied normalization on BIMCV-COVID19+ so I thought that would translate to padchest dataset too. Thanks for the insight @stbnps