Lightning-Universe / lightning-Covid19

Classification for covid-19 chest X-ray images using Lightning
https://pytorchlightning.github.io/lightning-Covid19
MIT License
56 stars 18 forks source link

data augmentation #6

Open Borda opened 4 years ago

Borda commented 4 years ago

Add reasonable image augmentation

edgarriba commented 4 years ago

cool, we have all this in kornia

edgarriba commented 4 years ago

/cc @ducha-aiki @shijianjian @anguelos

shpotes commented 4 years ago

Horizontal flip looks like a suitable augmentation, I'm not completely sure if vertical flip/rotation introduces interesting priors as X-rays are usually similarly oriented

edgarriba commented 4 years ago

rotation limited to small degrees I guess yes

Borda commented 4 years ago

Horizontal flip looks like a suitable augmentation, I'm not completely sure if vertical flip/rotation introduces interesting priors as X-rays are usually similarly oriented

unless there is the assumption that the object looks vertical different... but this could be just an extra training parameter, right?

bluesky314 commented 4 years ago

Where is the data in the first place? no link in readme

ducha-aiki commented 4 years ago

data link is https://github.com/PyTorchLightning/lightning-Covid19/issues/2 here

anguelos commented 4 years ago

In the Chester paper In figure 3. We can see that for pneumonia specifically augmentation might even do bad. If I am reading the plots right, the first column (undistorted test set) seems the most important. It seems that modest rotation scale and translation is the best augmentation. 15deg, 10%, and 10% respectively.

shijianjian commented 4 years ago

Generally, I think it should be alright as long as the label will not be changed by augmentation methods. For instance, ElasticTransform is probably a dangerous move. It would be best if we can invite a chest CT expert for more guidance.

If I understand this right, this project aims to tell Covid-19 out of other pneumonia pathologies like SARS, etc. Thus, we also need more support on pathology understanding to emphasize the most correlated features in the preprocessing phase and augmentations. In a clinical perspective, I think it also helps if we tell how CT experts make their decisions.

edgarriba commented 4 years ago

just noticed that the images come in a range between ~ +- 1000

Borda commented 4 years ago

it is quite common for medical images as they can be also in tiff with some offset :]

edgarriba commented 4 years ago

gotcha. And do we want that for training ? https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py#L47-L51

I think the dataset generator can be improved somehow

Borda commented 4 years ago

I think that we shall scale then anyway with the mean and SDT to about (-1, 1) interval

edgarriba commented 4 years ago

@Borda sure. Apparently images in this dataset are in png, jpg and jpeg. Some my guess no need apply apply an initial conversion. Please, also check my comment in here: https://github.com/PyTorchLightning/lightning-Covid19/pull/18#discussion_r399652347

image

image

image

not sure what would be the best. My guess would the best to analyses the whole image and create some kind of attention to not miss any part.