HumanSignal / label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
https://labelstud.io
Apache License 2.0
19.21k stars 2.39k forks source link

Adding ability to augment data #1118

Open courseprojects opened 3 years ago

courseprojects commented 3 years ago

Data augmentation is a technique to increase the amount and diversity of data without actually collecting new data. This comes in handy when labelled data are not easy to come by, and can help increase robustness of machine learning models. Currently in Label Studio, there isn't a way to augment data.

As an example, I have images for detection tasks. After drawing bounding boxes and labelling them, it would be nice to have a way to augment the images inside the "Data Manager" page of Label Studio so that I can have all my image data in one place and also compare the augmented result against the original images (for sanity checks). For images, there are different types of augmentations I have in mind:

Currently, my workflow is to download the labelled images and then have my own script to augment them. But having augmentation built-in to Label Studio would be valuable; and if we could have define different transformation pipelines and permute/assemble them into different images operators that would be awesome.

Granted, augmentation might not be as straight forward for other types of data, e.g. Audio. So maybe this feature could start out available to only certain subset of labelling tasks. Also, it might not be scalable for the team to customize transformation for different customers' satisfaction. So maybe an easier route is to have built in "hooks" inside the Label Studio backend that user can use to define different transformation. Label Studio frontend could just provide a way for user to choose and apply those augmentation steps, and store the transforms as metadata for each image.

twsl commented 3 years ago

There is no benefit from applying these augmentations before labeling as they do not change the label but increase the amount of work required to label the complete dataset. You should rather augment data in your dataloader after annotation is complete.

tomouellette commented 1 year ago

I tend to disagree slightly on this @twsl. I agree that augmentation in the traditional machine learning sense isn't beneficial here. However, color, contrast, saturation, or hue augmentation for making labeling easier would be beneficial. For example, in certain imaging setups, e.g. fluorescent microscopy, there is limited contrast and actually inverting the colors makes labeling easier.

As such, I think adding ability to modify image color directly in label-studio would be beneficial. Although this might deserve a different issue altogether.

twsl commented 1 year ago

@tomouellette I agree with you, that's why I created an issue https://github.com/heartexlabs/label-studio/issues/1425, https://github.com/heartexlabs/label-studio-frontend/issues/299 and PR https://github.com/heartexlabs/label-studio-frontend/pull/301, https://github.com/heartexlabs/label-studio-frontend/pull/328 more than a year ago, cause I had similar requirements . But getting this merged has been a hassle.

tomouellette commented 1 year ago

@twsl Great to hear you're on it! Even a simple addition of a manual slider enabling the transformation of the color spectra/intensity would be great.

twsl commented 1 year ago

@tomouellette wouldnt call it that. I closed my PR cause I couldnt get it merged and lost motivation to keep the PR updated. Feel free to take the code and start a new attempt.