Closed romainVala closed 4 years ago
All contributions are welcome!
One of the main issues related to this is that the random parameters generated by the RandomTransform
are stored in the sample. I thought this was nice was traceability. Adding these probabilistic compositions is nice, but if two samples don't share all their keys, it won't be possible to collate them in a batch by the default PyTorch DataLoader
collate_fn
. This is why I add the random parameters to a sample even if the transform is not applied, with the do_it
variable, as shown here:
https://github.com/fepegar/torchio/blob/224dbed948ef86328f8a9c739cdc1813efc41f6c/torchio/transforms/augmentation/intensity/random_bias_field.py#L47-L55
One solution is to force the user to define a custom collate function as in this example: https://github.com/fepegar/torchio/blob/224dbed948ef86328f8a9c739cdc1813efc41f6c/examples/example_heteromodal.py#L57-L64 but this seems like unnecessary work and slightly moves away from standard PyTorch practice.
In order to keep the default collate function, the best solution I can think of is to add the random parameters only if a flag has been enabled for it. In that case, it will be possible that samples will have different keys and the user will need to use a custom collate function. By default, these entries won't be added and the samples will be easily composed in a batch, even if they have followed paths in the transforms pipeline generated by these composing operations.
Yes I discover this problem playing with https://github.com/romainVala/torchio/blob/038540ab94e6ea6d75fb288e8c6a2a641ed5396a/torchio/transforms/augmentation/intensity/random_motion_from_time_course.py#L98
The problem is not related to the composition, it happend to each transform that have a probability to be use or not. My way to handel it is to always define all parameter that you want to add in the dictionary (with default value (the only dificulty is that it should be the same type, and the same shape ) and the parameter sample[image_name]['random_bias_do_augmentation'] will tell you if the transform has been applied or not
for instance here is the default value of the parameter I want to keep trace of https://github.com/romainVala/torchio/blob/038540ab94e6ea6d75fb288e8c6a2a641ed5396a/torchio/transforms/augmentation/intensity/random_motion_from_time_course.py#L93
it will be define a few line ater https://github.com/romainVala/torchio/blob/038540ab94e6ea6d75fb288e8c6a2a641ed5396a/torchio/transforms/augmentation/intensity/random_motion_from_time_course.py#L114 if the transformation is applied
Yes, that's the same strategy I showed in my comment. But I think that polluting all samples with irrelevant information adds unnecessary complexity to the dictionaries. Is like adding an entry with "the parameters that would have been used but never did".
This is 2 separates things: 1 The choice of the parameter to add in the dictionaries 2 the fact that it should be add with default value even if the tranfo is not applied
2 is necessary to have a mixte batch of "applied and no applied" transfo and 1 is indeed very dependant of the task in mind.
Personally I like it to be as complete as possible, to be able to check what has been performed
What happens if you have a list of 10 transforms and you compose them with OneOf
? Do you still need to go through the other 9 and add some parameters to the sample?
sure ! I see your point: 9 different dict parameters are empty... but at the end you want to retrieve them (in order to add conditions base on those parameters). To do so you need a complete dictionary. So yes
it may become hard to handel, if parameters witht the same name can have variable shape.
A other difficult question is about the exact structures to store them. For instance the order of the composition matter. how do you retrieve it...
For instance the order of the composition matter
You're right, this is also an issue.
Something I've also thought of is to store the transforms information in the sample in plain text, adding a line every time a transform is applied. But this wouldn't be readable and the text would need to be parsed. Also, currently, some parameters are saved on a per-image basis and other (spatial transforms) are saved for the whole sample.
The text approach would be like a "history" of operations applied to the sample, a bit like the autograd history in PyTorch tensors.
The last couple of commits will probably make retrieving random transforms parameters difficult. I'm planning to create a Sample
class that can be automatically collated by the DataLoader
and stores the random parameters information.
@romainVala You can retrieve the parameters for random transforms like this: https://github.com/fepegar/torchio/blob/6741adf39b969c8cf7de39b033fda9c2be1c93c0/examples/example_motion.py#L17-L31
yes, why not ... what happen when you are within a dataloader, where is the batch dimention
Now transforms return instances of Subject
, which inherits from dict
. The random parameters are now stored in the history
attribute of Subject
. The batch is collated with the default collate function in the data loader. I'm not sure I understand your question.
Ok, I better understand if I can play with it, so I try the following :
t = Compose([RandomNoise(), RandomElasticDeformation() ])
dataset = ImagesDataset(suj, transform=t)
sample = dataset[0]
then as you write, I can get the parameter of each transform, since the history has now the same length as number of compose. great !
But now if I do
dl = DataLoader(dataset, batch_size=4)
sample_batch = next(iter(dl))
but I can not retrieve the history informations type(sample) is torchio.data.subject.Subject but type(sample_batch) is dict
may be I missing something ?
You're right. I suppose that if you want to trace the random parameters, you'll need to collate the batch yourself:
dl = DataLoader(
dataset,
batch_size=4,
collate_fn=lambda x: x, # this creates a list of Subjects
)
samples = next(iter(dl))
inputs = torch.cat([sample['image'][torchio.DATA] for sample in samples])
histories = [sample.history for sample in samples]
Ok I see, thanks for the code exemple, I now better understand what the collate_fn does.
Well I do prefer the previous version where extra parameter, were added in the subject dict. in this case the batch concatenation was done automatically, (for the inputs data and for the parameters). I do no understand what was wrong with it
but I can deal with the new version, and do the torch cat myself ...
The problem with that was related to this issue, I mention it in the comments here. It adds unnecessary complexity to the code, the order in which transforms have been applied is lost, etc. I think the amount of work needed to trace the parameters (torch.cat
) is tiny compared to the trouble that would cause to the library to make include them in the dictionaries.
I can add an option to save the parameters as strings in the dictionary, but the user would need to parse them somehow.
ok thanks
I may be a good feature to add, if we can compose augmentation in a similar way as
https://albumentations.readthedocs.io/en/latest/probabilities.html#oneof-block
this make it very modular