Closed lillekemiker closed 3 years ago
Hi @lillekemiker,
Finally figured out what the issue is. So the difference lies in that a default transform is applied to datamodule.test_dataloader()
when calling the trainer.test
method (in the case of classification it is a standard normalization: https://github.com/PyTorchLightning/lightning-flash/blob/da684414f09cac8a65d412814491124343c8b416/flash/image/classification/transforms.py#L60) but when calling datamodule.test_dataloader()
outside the trainer object it is not.
@tchaton is this the expected behaviour.
Thank you!!
This has been driving me crazy.
Don't know if it is expected behavior but considering that transforms are defined on the datamodule
level, I definitely expected them to be applied. How would I even turn them on outside the trainer?
Hi @SkafteNicki @lillekemiker - this is definitely something we could do better. The challenge is that some of our transforms are applied in the dataloader and some are applied in the model. So at runtime we inject the transforms into the dataloader / model. There's a few options:
Interested to hear your thoughts :smiley:
What is the argument against having the dataloader handle all transforms without magic runtime injections?
@lillekemiker because we support transforms on device. So if the user provides per_sample_transform_on_device
or per_batch_transform_on_device
they have to be injected into the model rather than the dataloader.
I can see the logic behind this design decision. I don't agree with it, though, I don't think. As an abstraction, the dataloader should handle transforms. Technically, the dataloader often runs in multiple threads per GPU and so because you don't want multiple threads using the same GPU, you moved the GPU transforms into the model realm. Or at least I assume this is the reasoning. Am I missing anything? I think as a design decision, it is more important to keep the dataloader/model contact surface clean. Couldn't GPU transforms be applied in the same thread as the model before handing off the batch to the model rather than after? A GPU threadlock is another option but that might come with a performance penalty.
As such, transforms could live in the model, too, and simply be part of the model. I don't see anything conceptually wrong with that.
One major issue with the current way of doing it is that depending on whether or not Kornia is installed, the tensor normalization may be happening in per_batch_transform_on_device
or in post_tensor_transform
and so may or may not have happened in the manual case without a Trainer.
IMO if this was barebone lightning then I would expect that a batch looked the same regardless if it is inside the lightning trainer or outside, because lightning is "just reorganised pytorch code". However, if this also should be the case in flash I am not completely sure about. It depends completely on the design philosophy of flash. Since it is at an higher abstraction than lightning, I am fine with this not being supported. @ethanwharris is it possible to extract the data pipeline such that it would be possible to do something like
model(pipeline(batch))
I may also be a bit too unclear about Flash's design philosophy. Either way, though, if the dataloader is exposed to the end user (i.e. me), then I would expect it to either always apply all transforms or never apply any transforms, and that it would be consistent in its behavior. If the dataloader is an internal and I'm never supposed to see it, then I guess I shouldn't really care :)
As for the design philosophy, my personal vote would be that flash should be highly modular and made in a way making it easy to dismantle and replace parts of it with custom code. That way it provides a quick and easy baseline model with very little coding, but more importantly, once you need to move beyond the baseline model, you don't have to start over and rewrite the whole thing in lightning yourself. You just replace the parts that need replacing. If that is not what you guys are going for, though, that is also completely fair. Blackbox, off-the-shelf deep learning solutions definitely have their place too.
@SkafteNicki that could be possible. I think the issue with what we have now is that we inject the transforms into the correct places inside the trainer. What we could definitely do is expose a method like:
dataloader = datamodule.train_dataloader()
model, dataloader = datamodule.inject_transforms(model, dataloader)
If we documented that as the recommended way to use the flash datamodules without a lightning Trainer, I guess that would address some of the issues here? @tchaton Interested to hear your thoughts
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
đ Bug
When training a multilabel image classifier as described in the docs, (original link:https://lightning-flash.readthedocs.io/en/latest/reference/multi_label_classification.html),
I get different F1 metrics for the test set depending on how I run the evaluation:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The two F1 metrics should be identical
Environment
conda
,pip
, source): pipAdditional context
None