compose augmentation with given probabilitie

romainVala commented 4 years ago

I may be a good feature to add, if we can compose augmentation in a similar way as

https://albumentations.readthedocs.io/en/latest/probabilities.html#oneof-block

this make it very modular

romainVala commented 4 years ago

all is done there : https://github.com/albumentations-team/albumentations/blob/master/albumentations/core/composition.py

...

fepegar commented 4 years ago

All contributions are welcome!

fepegar commented 4 years ago

One of the main issues related to this is that the random parameters generated by the RandomTransform are stored in the sample. I thought this was nice was traceability. Adding these probabilistic compositions is nice, but if two samples don't share all their keys, it won't be possible to collate them in a batch by the default PyTorch DataLoader collate_fn. This is why I add the random parameters to a sample even if the transform is not applied, with the do_it variable, as shown here: https://github.com/fepegar/torchio/blob/224dbed948ef86328f8a9c739cdc1813efc41f6c/torchio/transforms/augmentation/intensity/random_bias_field.py#L47-L55

One solution is to force the user to define a custom collate function as in this example: https://github.com/fepegar/torchio/blob/224dbed948ef86328f8a9c739cdc1813efc41f6c/examples/example_heteromodal.py#L57-L64 but this seems like unnecessary work and slightly moves away from standard PyTorch practice.

In order to keep the default collate function, the best solution I can think of is to add the random parameters only if a flag has been enabled for it. In that case, it will be possible that samples will have different keys and the user will need to use a custom collate function. By default, these entries won't be added and the samples will be easily composed in a batch, even if they have followed paths in the transforms pipeline generated by these composing operations.

romainVala commented 4 years ago

Yes I discover this problem playing with https://github.com/romainVala/torchio/blob/038540ab94e6ea6d75fb288e8c6a2a641ed5396a/torchio/transforms/augmentation/intensity/random_motion_from_time_course.py#L98

The problem is not related to the composition, it happend to each transform that have a probability to be use or not. My way to handel it is to always define all parameter that you want to add in the dictionary (with default value (the only dificulty is that it should be the same type, and the same shape ) and the parameter sample[image_name]['random_bias_do_augmentation'] will tell you if the transform has been applied or not

romainVala commented 4 years ago

for instance here is the default value of the parameter I want to keep trace of https://github.com/romainVala/torchio/blob/038540ab94e6ea6d75fb288e8c6a2a641ed5396a/torchio/transforms/augmentation/intensity/random_motion_from_time_course.py#L93

it will be define a few line ater https://github.com/romainVala/torchio/blob/038540ab94e6ea6d75fb288e8c6a2a641ed5396a/torchio/transforms/augmentation/intensity/random_motion_from_time_course.py#L114 if the transformation is applied

fepegar commented 4 years ago

Yes, that's the same strategy I showed in my comment. But I think that polluting all samples with irrelevant information adds unnecessary complexity to the dictionaries. Is like adding an entry with "the parameters that would have been used but never did".

romainVala commented 4 years ago

This is 2 separates things: 1 The choice of the parameter to add in the dictionaries 2 the fact that it should be add with default value even if the tranfo is not applied

2 is necessary to have a mixte batch of "applied and no applied" transfo and 1 is indeed very dependant of the task in mind.

Personally I like it to be as complete as possible, to be able to check what has been performed

fepegar commented 4 years ago

What happens if you have a list of 10 transforms and you compose them with OneOf? Do you still need to go through the other 9 and add some parameters to the sample?

romainVala commented 4 years ago

sure ! I see your point: 9 different dict parameters are empty... but at the end you want to retrieve them (in order to add conditions base on those parameters). To do so you need a complete dictionary. So yes

it may become hard to handel, if parameters witht the same name can have variable shape.

A other difficult question is about the exact structures to store them. For instance the order of the composition matter. how do you retrieve it...

fepegar commented 4 years ago

For instance the order of the composition matter

You're right, this is also an issue.

Something I've also thought of is to store the transforms information in the sample in plain text, adding a line every time a transform is applied. But this wouldn't be readable and the text would need to be parsed. Also, currently, some parameters are saved on a per-image basis and other (spatial transforms) are saved for the whole sample.

fepegar commented 4 years ago

The text approach would be like a "history" of operations applied to the sample, a bit like the autograd history in PyTorch tensors.

fepegar commented 4 years ago

The last couple of commits will probably make retrieving random transforms parameters difficult. I'm planning to create a Sample class that can be automatically collated by the DataLoader and stores the random parameters information.

fepegar commented 4 years ago

@romainVala You can retrieve the parameters for random transforms like this: https://github.com/fepegar/torchio/blob/6741adf39b969c8cf7de39b033fda9c2be1c93c0/examples/example_motion.py#L17-L31

romainVala commented 4 years ago

yes, why not ... what happen when you are within a dataloader, where is the batch dimention

fepegar commented 4 years ago

Now transforms return instances of Subject, which inherits from dict. The random parameters are now stored in the history attribute of Subject. The batch is collated with the default collate function in the data loader. I'm not sure I understand your question.

romainVala commented 4 years ago

Ok, I better understand if I can play with it, so I try the following :

t = Compose([RandomNoise(),  RandomElasticDeformation() ])
dataset = ImagesDataset(suj, transform=t)
sample = dataset[0]

then as you write, I can get the parameter of each transform, since the history has now the same length as number of compose. great !

But now if I do

dl = DataLoader(dataset, batch_size=4)
sample_batch = next(iter(dl))

but I can not retrieve the history informations type(sample) is torchio.data.subject.Subject but type(sample_batch) is dict

may be I missing something ?

fepegar commented 4 years ago

You're right. I suppose that if you want to trace the random parameters, you'll need to collate the batch yourself:

dl = DataLoader(
    dataset,
    batch_size=4,
    collate_fn=lambda x: x,  # this creates a list of Subjects
)  
samples = next(iter(dl))
inputs = torch.cat([sample['image'][torchio.DATA] for sample in samples])
histories = [sample.history for sample in samples]

romainVala commented 4 years ago

Ok I see, thanks for the code exemple, I now better understand what the collate_fn does.

Well I do prefer the previous version where extra parameter, were added in the subject dict. in this case the batch concatenation was done automatically, (for the inputs data and for the parameters). I do no understand what was wrong with it

but I can deal with the new version, and do the torch cat myself ...

fepegar commented 4 years ago

The problem with that was related to this issue, I mention it in the comments here. It adds unnecessary complexity to the code, the order in which transforms have been applied is lost, etc. I think the amount of work needed to trace the parameters (torch.cat) is tiny compared to the trouble that would cause to the library to make include them in the dictionaries.

I can add an option to save the parameters as strings in the dictionary, but the user would need to parse them somehow.

romainVala commented 4 years ago

ok thanks

fepegar / torchio

compose augmentation with given probabilitie #34