Fix & improve Lucas-Kanade optical flow

pxl-th commented 3 years ago

Hi! I was trying to use LK optical flow and noticed that it didn't give the correct results. Either the flow was too big or its direction was incorrect. Additionally it was dropping a lot of points, marking them as bad.

Bugs:

Upon investigation I've noticed two major bugs:

Incorrect pyramid level in the determine_pyramid_coordinates function. The indexing starts at 1, but when computing pyramid point, it should start at 0, so the first level does not change the coordinates, since it is the original level.
In the beginning of the optical flow there is the conversion of points and initial displacements from (row, col) to (x, y) format. However, all subsequent functions operate in the (row, col) format. Simple way to check this is to visualize the gradient windows for the points.

Changes:

Do tell me if you don't agree with some of the changes.

Use the correct index for pyramid level.
Drop the conversion from (row, col) to (x, y) and operate only in (row, col) format.
Set default value for the eigenvalue_threshold parameter to 1e-4 as in OpenCV.
Remove in_image function, since lies_in is enough.
Convert while loops to explicit expressions in get_grid function.
Simplify displacement creation in tests.
Use default parameters in tests for Lucas-Kanade algorithm.
Refactoring. Replace tabs in certain places with spaces for consistency purposes. Make rows no more than 92 characters long & minor refactoring (this allows for split-screen editing 🙃).
Add epsilon termination criteria as in OpenCV LK.
Update LK tests to require bigger accuracy.
Remove is_truncated_window. Use get_grid to determine whether we need to recompute gradient or can use the previous one.
Remove flow vector, since it was used only to store pyramid layer contributions and was redundant.
Remove extrapolation and use only interpolation for the second pyramid layers. Since we never operate on the out-of-bounds regions.
Use svd2x2 to compute minimum eigenvalues instead of calling eigen.
Make optflow! function operate on the LKPyramid structure that holds image pyramid and gradients.

TODOs:

[ ] Add ability to pass precomputed pyramids and image gradients. This can be useful, when computing flow for the image with more than one other image. Since this way we would need to compute them only once. I think about either adding them to keyword parameters, or creating LKPyramid struct that will contain both pyramids and gradients. Creating struct will decrease number of parameters in the function, which would make things easier.

Visual comparison:

I've used the same code for visualization as in the example docs.

Starting with the simple case, where the figure moves only horizontally.

Current	This PR

Car moves forward. Two adjacent frames in a video.

Current	This PR

Basketball example. Motion is mainly in the hands and the body. Static background.

Current	This PR

Cup. Horizontal movement.

Current	This PR

zygmuntszpak commented 3 years ago

Thanks for working on this. Looks very promising! I'll need a couple of days to reacquaint myself with the code and algorithm. There is definitely a problem with the current implementation because I can see in the old code that we have Ix, Iy = imgradients instead of Iy, Ix = imgradients which is one of the several things you have fixed!

zygmuntszpak commented 3 years ago

Hi! I was trying to use LK optical flow and noticed that it didn't give the correct results. Either the flow was too big or its direction was incorrect. Additionally it was dropping a lot of points, marking them as bad.

Bugs:

Upon investigation I've noticed two major bugs:
1. Incorrect pyramid level in the `determine_pyramid_coordinates` function.
   The indexing starts at `1`, but when computing pyramid point, it should start at `0`, so the first level does not change the coordinates, since it is the original level.

2. In the beginning of the optical flow there is the conversion of points and initial displacements from `(row, col)` to `(x, y)` format.
   However, all subsequent functions operate in the `(row, col)` format.
   Simple way to check this is to visualize the gradient windows for the points.

Thanks for addressing these colossal bugs.

Fixes:

Do tell me if you don't agree with some of the changes.

* Use the correct index for pyramid level.

* Drop the conversion from `(row, col)` to `(x, y)` and operate only in `(row, col)` format.

* Set default value for the `eigenvalue_threshold` parameter to `1e-4` as in [OpenCV](https://docs.opencv.org/4.5.2/dc/d6b/group__video__track.html#ga473e4b886d0bcc6b65831eb88ed93323).

* Remove `in_image` function, since `lies_in` is enough.

* Convert `while` loops to explicit expressions in `get_grid` function.

* Simplify displacement creation in tests.

* Use default parameters in tests for Lucas-Kanade algorithm.

* Replace tabs in certain places with spaces for consistency purposes.

* Make rows no more than 80 characters long & minor refactoring (this allows for split-screen editing upside_down_face).

I think overall these are some excellent changes and I am grateful for the time you put into making them.

If we are going to be changing the code style then I suggest we follow the BlueStyle which I believe already has a lot of traction in the Julia community. One of the key implications is that the 80 character limit would become a 92 character limit.

TODOs:

* [ ]  Add epsilon termination criteria as in [OpenCV LK](https://docs.opencv.org/4.5.2/d9/d5d/classcv_1_1TermCriteria.html#a56fecdc291ccaba8aad27d67ccf72c57a857609e73e7028e638d2ea649f3b45d5).

* [ ]  Add ability to pass precomputed pyramids and image gradients.
  This can be useful, when computing flow for the image with more than one other image.
  Since this way we would need to compute them only once.
  I think about either adding them to keyword parameters, or creating `LKPyramid` struct that will contain both pyramids and gradients. Creating struct will decrease number of parameters in the function, which would make things easier.

* [ ]  Add tests that fail the current LK version, but succeed on this PR.

* [ ]  Bring back deleted comments (this was done for simplification).

Those are great suggestions. However, I propose you keep the "ability to pass precomputed pyramids and image gradients." for a separate pull request.

One important issue that needs to be addressed in this PR is an apparent substantial increase in runtime. Using BenchmarkTools I found that the existing code takes approximately 1 second to complete on a pair of test images on my machine, whereas this pull-request takes 5 seconds for the same pair. I believe the issue is that your proposed version allocates memory within performance-critical loops. I think this can be traced down to you removing the use of the custom pinv2x2 function in favour of the built-in pinv function. Revisting this code, I think that we may be better-off using the svd2x2 function directly. This will allow us to construct the pseudo-inverse, but also give us the singular values which we can use to define min_eigenvalue (if I recall correctly, for real symmetric matrices with non-negative eigenvalues, the eigenvalues and singular values are the same).

Thanks again for reviving this implementation.

pxl-th commented 3 years ago

Hi! Thanks for the review. I've made several improvements and changes based on your comments.

One thing to note is that at the moment of your review I've already implemented version of the algorithm that operates on LKPyramid struct that holds pyramid for the image and its gradients. However, I've not exposed the functionality to the user, so from API standpoint it is still the same. Only underlying functions are utilizing LKPyramid struct. I'd appreciate if we can add these changes in this pull request. But we can expose the necessary API in a separate PR.

I've also used svd2x2 to compute minimum singular values, since they indeed coincide with the eigenvalues in this case.

Quick summary of changes from previous review

Changed max line width to 92 symbols.
Remove is_truncated_window. Use get_grid to determine whether we need to recompute gradient or can use the previous one. Since is_truncated_window always required to recompute the gradients, but between iterations the grid can stay the same (even with the truncated window), leading to redundant recomputations.
Remove flow vector, since it was used only to store pyramid layer contributions and was redundant.
Remove extrapolation and use only interpolation for the second pyramid's layers. Since we never operate on the out-of-bounds regions.
Use svd2x2 to compute minimum eigenvalues instead of calling eigen (singular values conicide with eigenvalues in this case).
Add epsilon termination criteria. Add the necessary parameter to LucasKanade struct as well.
Update LK tests to require higher accuracy.

Benchmarking results (on 1000 initial keypoints)

Current: 1.555 s (866161 allocations: 260.38 MiB)

This PR:

With epsilon termination criteria (used by default): 762.053 ms (7079 allocations: 240.53 MiB)
Without epsilon termination criteria: 1.827 s (7091 allocations: 240.53 MiB)
With precomputed pyramids: 301.647 ms (10 allocations: 10.31 MiB)

Note that without epsilon termination criteria the time is slightly bigger. This can be because current version and this PR are tracking and invalidating different amount of points.

pxl-th commented 3 years ago

Benchmarking results with the latest changes

Parameters used:

initial number of keypoints = 1000
window size = 21
levels = 3
iterations = 30

Current: 1.555 s (866161 allocations: 260.38 MiB)

This PR:

With epsilon termination criteria: 352.698 ms (6839 allocations: 199.00 MiB)
Using precomputed pyramids: 131.568 ms (10 allocations: 10.31 MiB)

johnnychen94 commented 3 years ago

5x-10x is a big improvement!

johnnychen94 commented 3 years ago

@pxl-th out of curiosity, there're some legacy pending PRs in this repo, are you interested in taking the role and finishing them?

JuliaImages / ImageTracking.jl