Lightning-Universe / lightning-Covid19

Classification for covid-19 chest X-ray images using Lightning
https://pytorchlightning.github.io/lightning-Covid19
MIT License
56 stars 18 forks source link

Possible data leakage? #11

Open oplatek opened 4 years ago

oplatek commented 4 years ago

Possible data leakage? On the original dataset, there are several images from the same patient see for example patient number 2 Should we take this into account when splitting the data?

Originally posted by @shpotes in https://github.com/PyTorchLightning/lightning-Covid19/pull/10

Borda commented 4 years ago

https://github.com/PyTorchLightning/lightning-Covid19/pull/10#discussion_r394055265 is we shall overwrite it in the later stage but for now, we can stay with this dataloader

oplatek commented 4 years ago

The problem is that images from a single patient can be in both (all) train, valid (test) sets.