airctic / icevision

An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come
https://airctic.github.io/icevision/
Apache License 2.0
845 stars 149 forks source link

Albumentations Adapter Fails When Record Has Multiple `ImageRecordComponents` #850

Closed rsomani95 closed 3 years ago

rsomani95 commented 3 years ago

The albumentations adapter seems to try to grab from images that aren't record.img. I'm not sure what causes this, but here's a reproducible example:

from icevision.all import *
import requests, io

# Create template record
rec = BaseRecord(
    [
        FilepathRecordComponent(),
        InstancesLabelsRecordComponent(),
        BBoxesRecordComponent(),
        ClassificationLabelsRecordComponent(task=Task("color_saturation")),
        ImageRecordComponent(task=Task("color_saturation")),
    ]
)

# Download Test Img
tmp_fpath = "/tmp/a-wrinkle-in-time-filmgrab.jpg"
req = requests.get("https://film-grab.com/wp-content/uploads/photo-gallery/wrinkle046.jpg?bwg=1551280654", stream=True)
img = Image.open(io.BytesIO(req.content))
img.save(tmp_fpath)

# Setup
rec.set_filepath(tmp_fpath)
rec = rec.load()

# Transform
tfm = tfms.A.Adapter(
    [
        tfms.A.Resize(224, 224),
        tfms.A.Normalize(),
    ]
)
tfm(rec)

This gives the following error:

Error Trace ```python --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 5 ] 6 ) ----> 7 tfm(rec) ~/miniconda2/envs/det/lib/python3.7/site-packages/icevision/tfms/transform.py in __call__(self, record) 9 # TODO: this assumes record is already loaded and copied 10 # which is generally true ---> 11 return self.apply(record) 12 13 @abstractmethod ~/miniconda2/envs/det/lib/python3.7/site-packages/icevision/tfms/albumentations/albumentations_adapter.py in apply(self, record) 266 tfms = self.create_tfms() 267 # apply transform --> 268 self._albu_out = tfms(**self._albu_in) 269 270 # store additional info (might be used by components on `collect`) ~/miniconda2/envs/det/lib/python3.7/site-packages/albumentations/core/composition.py in __call__(self, force_apply, *args, **data) 164 if args: 165 raise KeyError("You have to pass data to augmentations as named arguments, for example: aug(image=image)") --> 166 self._check_args(**data) 167 assert isinstance(force_apply, (bool, int)), "force_apply must have bool or int type" 168 need_to_run = force_apply or random.random() < self.p ~/miniconda2/envs/det/lib/python3.7/site-packages/albumentations/core/composition.py in _check_args(self, **kwargs) 219 if internal_data_name in checked_single: 220 if not isinstance(data, np.ndarray): --> 221 raise TypeError("{} must be numpy array type".format(data_name)) 222 if internal_data_name in checked_multi: 223 if data: TypeError: image must be numpy array type ``` On inspecting in the debugger, `data` is `None`. I'm unable to figure out where this is being passed to the adapter from.

However, if you comment out the ImageRecordComponent(task=Task("color_saturation")) line, the error isn't caused anymore.

The use case for having multiple ImageRecordComponents is when you'd like to have multiple transforms of the same image and feed them to different parts of the model - multi task training and contrastive learning come to mind.

rsomani95 commented 3 years ago

@lgvaz and I had discussed this on call. This is indeed expected behavior and not technically a bug, as the tfms.A.Adapter goes over every component in the record and calls setup_transform on it. This fails in the above case because ImageRecordComponent is None for some of the records

What would be really nice is to have a functional form where you can just pass in the record and specify which parts of it you'd like to transform. A feature for a later date perhaps.