data augmentation - Githubissues

Lightning-Universe / lightning-Covid19

Classification for covid-19 chest X-ray images using Lightning

https://pytorchlightning.github.io/lightning-Covid19

MIT License

56 stars 18 forks source link

data augmentation #6

Open Borda opened 4 years ago

Borda commented 4 years ago

Add reasonable image augmentation

horizontal/vertical flip
rotation
zoom
etc

edgarriba commented 4 years ago

cool, we have all this in kornia

edgarriba commented 4 years ago

/cc @ducha-aiki @shijianjian @anguelos

shpotes commented 4 years ago

Horizontal flip looks like a suitable augmentation, I'm not completely sure if vertical flip/rotation introduces interesting priors as X-rays are usually similarly oriented

edgarriba commented 4 years ago

rotation limited to small degrees I guess yes

Borda commented 4 years ago

Horizontal flip looks like a suitable augmentation, I'm not completely sure if vertical flip/rotation introduces interesting priors as X-rays are usually similarly oriented

unless there is the assumption that the object looks vertical different... but this could be just an extra training parameter, right?

bluesky314 commented 4 years ago

Where is the data in the first place? no link in readme

ducha-aiki commented 4 years ago

data link is https://github.com/PyTorchLightning/lightning-Covid19/issues/2 here

anguelos commented 4 years ago

In the Chester paper In figure 3. We can see that for pneumonia specifically augmentation might even do bad. If I am reading the plots right, the first column (undistorted test set) seems the most important. It seems that modest rotation scale and translation is the best augmentation. 15deg, 10%, and 10% respectively.

shijianjian commented 4 years ago

Generally, I think it should be alright as long as the label will not be changed by augmentation methods. For instance, ElasticTransform is probably a dangerous move. It would be best if we can invite a chest CT expert for more guidance.

If I understand this right, this project aims to tell Covid-19 out of other pneumonia pathologies like SARS, etc. Thus, we also need more support on pathology understanding to emphasize the most correlated features in the preprocessing phase and augmentations. In a clinical perspective, I think it also helps if we tell how CT experts make their decisions.

edgarriba commented 4 years ago

just noticed that the images come in a range between ~ +- 1000

Borda commented 4 years ago

it is quite common for medical images as they can be also in tiff with some offset :]

edgarriba commented 4 years ago

gotcha. And do we want that for training ? https://github.com/mlmed/torchxrayvision/blob/master/torchxrayvision/datasets.py#L47-L51

I think the dataset generator can be improved somehow

Borda commented 4 years ago

I think that we shall scale then anyway with the mean and SDT to about (-1, 1) interval

edgarriba commented 4 years ago

@Borda sure. Apparently images in this dataset are in png, jpg and jpeg. Some my guess no need apply apply an initial conversion. Please, also check my comment in here: https://github.com/PyTorchLightning/lightning-Covid19/pull/18#discussion_r399652347

not sure what would be the best. My guess would the best to analyses the whole image and create some kind of attention to not miss any part.