flow2ml / Flow2ML

An Open Source Library to make Machine Learning process much Simpler
MIT License
25 stars 26 forks source link

Data Augumentation #55

Closed yvkrishna closed 3 years ago

yvkrishna commented 3 years ago

Data augmentation plays one of the key roles in any ml project. So need to implement data augmentation techniques such as

  1. Flipping
  2. Rotation
  3. Shearing
  4. Cropping
  5. Zoom in, Zoom out
  6. scaling

using python dictionary. Where the key denotes the data augmentation type. value denotes the amount of that particular technique that needs to be applied

For example augmentation_techniques = { rotation:15, zoom_range: 0.5, shear_range: 0.5, .. ... ... }

yvkrishna commented 3 years ago

To make work simpler, I am assigning each of the techniques as separate issues of level 1 each. Please use this thread to discuss anything related to improving techniques.

If you want to contribute then please try the sub-issues with the same topic name.

rubyruins commented 3 years ago

Hi @yvkrishna, I was going to start working on this after the unit tests issue. What should I do now? Could I work parallelly on these sub-issues?

yvkrishna commented 3 years ago

Hi @yvkrishna, I was going to start working on this after the unit tests issue. What should I do now? Could I work parallelly on these sub-issues?

Yes you can work on them and send individual pr's

rubyruins commented 3 years ago

Thanks, I will start working on it. I have some questions, though. The data augmentation methods will pick up the data from dataset_dir/data_dir right? Where should I stored the augmented images? For instance in filters the filtered images used to be stored in the folder classname/filtername.

Also, given multiple augmentation options, it should apply only one technique at a time, right? For example, given the following options:

augmentation_techniques = { rotation:15, zoom_range: 0.5, shear_range: 0.5, .. ... ... }

We will create folders in this way, right? class1/rotatedImages, class1/zoomedImages and so on, right? Each of these are individually applied on the image?

chebroluharika commented 3 years ago

@rubyruins , please go ahead and store images folder wise like class1/zoomedImages etc., but this type of folder structure creates memory exhaust issue when we have huge dataset. Going forward, we should store images in Amazon S3 bucket or Azure lob storage and provide link here during model building. If you ar fine with it, you can work on it once this augmentation implementation was done.

rubyruins commented 3 years ago

@chebroluharika, okay, thanks for clarifying!

chebroluharika commented 3 years ago

@rubyruins when can we expect the completion of this total task?

rubyruins commented 3 years ago

I will do it by Friday this week (May 7) since there are 5 sub parts, so it will take me a while. Hope that's okay.

chebroluharika commented 3 years ago

@rubyruins as code would be almost similar with your previous PR i.e., implementing flip operation, I think these 5 PRs won’t take much time. Anyways try to complete as fast as possible

rubyruins commented 3 years ago

Yes, sure. I will try to do it as soon as I can.

yvkrishna commented 3 years ago

@rubyruins, @chebroluharika Just wanted to raise another technique, we can try to change the brightness of an image as well as part of the image augmentation technique.

Log Transformation s = c log(1+ r) We can use the np.log() method to change the brightness of the image. The Log transformation is generally used for transforming a narrow range of pixel values into a wide range.

Power-Law Transformation s = crγ The power-law transformation is commonly used to match the intensity values with the non-linear characteristics of certain devices that respond to the intensity values. This one also can be used to change the brightness of the image.

yvkrishna commented 3 years ago

@rubyruins If you are done with the other methods then I think you can try this method also. This just takes a maximum of 2-3 lines of code only.

rubyruins commented 3 years ago

@yvkrishna, sure, I could add this too.

chebroluharika commented 3 years ago

As brightness and contrast will be taken care by #88, closing this.