JuliaImages / ImageTracking.jl

Other
26 stars 8 forks source link

Fix & improve Lucas-Kanade optical flow #34

Closed pxl-th closed 3 years ago

pxl-th commented 3 years ago

Hi! I was trying to use LK optical flow and noticed that it didn't give the correct results. Either the flow was too big or its direction was incorrect. Additionally it was dropping a lot of points, marking them as bad.

Bugs:

Upon investigation I've noticed two major bugs:

  1. Incorrect pyramid level in the determine_pyramid_coordinates function. The indexing starts at 1, but when computing pyramid point, it should start at 0, so the first level does not change the coordinates, since it is the original level.
  2. In the beginning of the optical flow there is the conversion of points and initial displacements from (row, col) to (x, y) format. However, all subsequent functions operate in the (row, col) format. Simple way to check this is to visualize the gradient windows for the points.

Changes:

Do tell me if you don't agree with some of the changes.

TODOs:

Visual comparison:

I've used the same code for visualization as in the example docs.

Starting with the simple case, where the figure moves only horizontally.

Current This PR
lk-old-1 lk-new-1

Car moves forward. Two adjacent frames in a video.

Current This PR
lk-old-3 lk-new-3

Basketball example. Motion is mainly in the hands and the body. Static background.

Current This PR
lk-old-4 lk-new-4

Cup. Horizontal movement.

Current This PR
lk-old-2 lk-new-2
zygmuntszpak commented 3 years ago

Thanks for working on this. Looks very promising! I'll need a couple of days to reacquaint myself with the code and algorithm. There is definitely a problem with the current implementation because I can see in the old code that we have Ix, Iy = imgradients instead of Iy, Ix = imgradients which is one of the several things you have fixed!

zygmuntszpak commented 3 years ago

Hi! I was trying to use LK optical flow and noticed that it didn't give the correct results. Either the flow was too big or its direction was incorrect. Additionally it was dropping a lot of points, marking them as bad.

Bugs:

Upon investigation I've noticed two major bugs:

1. Incorrect pyramid level in the `determine_pyramid_coordinates` function.
   The indexing starts at `1`, but when computing pyramid point, it should start at `0`, so the first level does not change the coordinates, since it is the original level.

2. In the beginning of the optical flow there is the conversion of points and initial displacements from `(row, col)` to `(x, y)` format.
   However, all subsequent functions operate in the `(row, col)` format.
   Simple way to check this is to visualize the gradient windows for the points.

Thanks for addressing these colossal bugs.

Fixes:

Do tell me if you don't agree with some of the changes.

* Use the correct index for pyramid level.

* Drop the conversion from `(row, col)` to `(x, y)` and operate only in `(row, col)` format.

* Set default value for the `eigenvalue_threshold` parameter to `1e-4` as in [OpenCV](https://docs.opencv.org/4.5.2/dc/d6b/group__video__track.html#ga473e4b886d0bcc6b65831eb88ed93323).

* Remove `in_image` function, since `lies_in` is enough.

* Convert `while` loops to explicit expressions in `get_grid` function.

* Simplify displacement creation in tests.

* Use default parameters in tests for Lucas-Kanade algorithm.

* Replace tabs in certain places with spaces for consistency purposes.

* Make rows no more than 80 characters long & minor refactoring (this allows for split-screen editing upside_down_face).

I think overall these are some excellent changes and I am grateful for the time you put into making them.

If we are going to be changing the code style then I suggest we follow the BlueStyle which I believe already has a lot of traction in the Julia community. One of the key implications is that the 80 character limit would become a 92 character limit.

TODOs:

* [ ]  Add epsilon termination criteria as in [OpenCV LK](https://docs.opencv.org/4.5.2/d9/d5d/classcv_1_1TermCriteria.html#a56fecdc291ccaba8aad27d67ccf72c57a857609e73e7028e638d2ea649f3b45d5).

* [ ]  Add ability to pass precomputed pyramids and image gradients.
  This can be useful, when computing flow for the image with more than one other image.
  Since this way we would need to compute them only once.
  I think about either adding them to keyword parameters, or creating `LKPyramid` struct that will contain both pyramids and gradients. Creating struct will decrease number of parameters in the function, which would make things easier.

* [ ]  Add tests that fail the current LK version, but succeed on this PR.

* [ ]  Bring back deleted comments (this was done for simplification).

Those are great suggestions. However, I propose you keep the "ability to pass precomputed pyramids and image gradients." for a separate pull request.

One important issue that needs to be addressed in this PR is an apparent substantial increase in runtime. Using BenchmarkTools I found that the existing code takes approximately 1 second to complete on a pair of test images on my machine, whereas this pull-request takes 5 seconds for the same pair. I believe the issue is that your proposed version allocates memory within performance-critical loops. I think this can be traced down to you removing the use of the custom pinv2x2 function in favour of the built-in pinv function. Revisting this code, I think that we may be better-off using the svd2x2 function directly. This will allow us to construct the pseudo-inverse, but also give us the singular values which we can use to define min_eigenvalue (if I recall correctly, for real symmetric matrices with non-negative eigenvalues, the eigenvalues and singular values are the same).

Thanks again for reviving this implementation.

pxl-th commented 3 years ago

Hi! Thanks for the review. I've made several improvements and changes based on your comments.

One thing to note is that at the moment of your review I've already implemented version of the algorithm that operates on LKPyramid struct that holds pyramid for the image and its gradients. However, I've not exposed the functionality to the user, so from API standpoint it is still the same. Only underlying functions are utilizing LKPyramid struct. I'd appreciate if we can add these changes in this pull request. But we can expose the necessary API in a separate PR.

I've also used svd2x2 to compute minimum singular values, since they indeed coincide with the eigenvalues in this case.

Quick summary of changes from previous review

Benchmarking results (on 1000 initial keypoints)

Current: 1.555 s (866161 allocations: 260.38 MiB)

This PR:

Note that without epsilon termination criteria the time is slightly bigger. This can be because current version and this PR are tracking and invalidating different amount of points.

pxl-th commented 3 years ago

Benchmarking results with the latest changes

Parameters used:

Current: 1.555 s (866161 allocations: 260.38 MiB)

This PR:

johnnychen94 commented 3 years ago

5x-10x is a big improvement!

johnnychen94 commented 3 years ago

@pxl-th out of curiosity, there're some legacy pending PRs in this repo, are you interested in taking the role and finishing them?