aleju / imgaug

Image augmentation for machine learning experiments.
MIT License
14.33k stars 2.43k forks source link

Performance of only-keypoints augmentations #635

Open AmitMY opened 4 years ago

AmitMY commented 4 years ago

Following up: #621

I want to run data augmentation on poses alone (I have an interesting scenario imo) and I want to do it fast, as right now to augment a batch it takes me upwards of 30 seconds, while the training loop entirely takes 1 second.

I believe that augmenting keypoints is an easier task than augmenting images, and it might just be a matter of bad data representation.

I'm loading an image (500x500 = 250,000 pixels) I'm also loading a list of keypoints (10,564 points = 4.2% of the pixels).

Now, for example, if we run different augmentations on these 2 data sources, we get a huge time disparity - images are much faster, although they contain more data: (Time in seconds)

Horizontal Flip images 0.0010446 keypoints 0.028449

Affine Transform images 0.0035259 keypoints 0.0351857

Perspective Transform images 0.0047839 keypoints 0.1963689

However, if instead, we decide to represent key points as a NumPy array of dimensions [N, 2], any operation on it is much faster!

points_np = np.random.rand(10564, 2)

# Flip
timeit("keypoints", lambda: np.array([1, 1]) - points_np) # 0.00012 seconds

# Transformation
transofrmation_matrix = np.array([[1, 0], [0, 1]])
timeit("keypoints", lambda:, transofrmation_matrix)) # 0.00011 seconds

A change of representation here will be huge for the time, we are talking at least 2 orders of magnitude in flip, and matrix transformation.

To reproduce everything, here is a collab!

keypoints.txt file is here if you want to run it yourself! keypoints.txt

aleju commented 4 years ago

Hm, are you sure you have 10k points in your input? How do you call your augmentation routines? According to the computed performance numbers (see ), the library is able to process around 700k keypoints per second with Fliplr(p=1.0) on my machine -- and that hardware is by now quite outdated. Your numbers seem to be at around 4.6k/sec. Now the keypoint augmentation is quite a lot slower than it could be as each keypoint is currently represented as an object instead of using a single numpy array for all of them, but 4.6k/sec still seems quite slow. Unless there is a major error with the way the performance values are computed, I guess there is something wrong in your call or system configuration.

AmitMY commented 4 years ago

Thanks for looking into this!

This sequence of keypoints (10k) is having roughly 100 keypoints per frame, for 100 frames. I want to augment an entire video at once, as it is probably the most correct way to do so. (augmentation is done the same way for every frame)

You can see from my code (in the google collab) that augmenting keypoints is way too slow, and that actually doesn't also account for the creation of the "Keypoint" object. (added speed test in collab).

Creating the Keypoint object (for 10,000 items) takes 0.02498 seconds on average, which means even without any augmentation this is limited to 40 times a second.

I highly recommend to go to the collab, upload the "keypoints.txt" file, and "Runtime -> Run All" to really see how slow it is.

Unless I have something wrong, as you said, it is what it is... Colab replecates all of the arguments here.

AmitMY commented 4 years ago

Real-life use case: Here are 2 methods, augment2d which performs a rotation, scaling, and shearing, and augment2d_imgaug which can do anything imgaug supports.

The main difference is that the augment2d_imguag deconstructs the keypoints to an array of Keypoint, and then reassembled it, and augment2d works on the original array.

Here is a performance test: