aleju / imgaug

Image augmentation for machine learning experiments.
http://imgaug.readthedocs.io
MIT License
14.34k stars 2.43k forks source link

Style Transfer Transformations #327

Open luxedo opened 5 years ago

luxedo commented 5 years ago

It has been pointed out that CNNs are biased towards texture, and that applying style transfer to the training dataset improves performance of the models.

Any plans for adding style transfer as a transformation?

aleju commented 5 years ago

I'm aware of that paper and have thought about adding style transfer to the library. I can't give a timeline for that though. It is going to be pretty hard as it would have to work with tensorflow (including Keras) and pytorch -- possibly more frameworks. It would also have to work with very different image sizes. Most likely there would also be issues when combining this with multiprocessing and I'm not sure yet how to prevent them.

In the meantime, there is now Canny edge detection in the library -- though I didn't finish reading the paper yet, so I don't know if the authors would say that canny edge detection is a good alternative to style transfer.

luxedo commented 5 years ago

I see, it seems hard to support several frameworks.

What if we use OpenCV? There's a tutorial here, and it's already a dependency of this repo, I could take a look at it.

aleju commented 5 years ago

When adding style transfer you have the option of either applying that style transfer operation via (a) the GPU or (b) the CPU. (a) Has the advantage of being fast. It has the disadvantage that the GPU might run out of memory due to also having to do the network training. So the code would have to be adapted so that the user can choose the GPU to use in multi-GPU systems. The adaptations should also cover the case that in the future more operation might support the GPU and then one would not want the data to be copied back and forth from/to the GPU. It should instead stay on the GPU until it is needed on the CPU and vice versa. Another disadvantage is that the framework used for the GPU-based style transfer might interfere with the framework used for training the network. E.g. last time I used pytorch and tensorflow on the same GPU that led to CUDA errors. A third disadvantage is that it makes things harder to test, as e.g. travis doesn't have GPU support. (b) Using the CPU is the exact inverse. It is not expected to interfere with the training network (doesn't matter if OpenCV, tensorflow or pytorch is used), is much less likely to cause memory issues and is testable. However, it is also going to be much slower than a GPU solution.

So using OpenCV (a CPU-based solution) wouldn't cause many issues, except for being slow (the post says 0.3s on their hardware, which would probably make it the slowest augmentation in the library). However, it also only has the advantage over pytorch and tensorflow of already being a dependency, which isn't a very significant one. It also comes at the price of decreased flexibility, as the specific method used for style transfer can only be changed when OpenCV changes it (and they seem to use a rather old one). Then again, OpenCV tends to have a more stable interface than pytorch and tensorflow...

Well, I might end up using the OpenCV method, just because it looks fairly easy.

Neltherion commented 3 years ago

Any updates on this issue?

Having Style Transfer augmentations would surely make training models which are biased towards textures a lot easier...

Thanks