Closed pxl-th closed 3 years ago
Thanks for working on this. Looks very promising! I'll need a couple of days to reacquaint myself with the code and algorithm. There is definitely a problem with the current implementation because I can see in the old code that we have Ix, Iy = imgradients
instead of Iy, Ix = imgradients
which is one of the several things you have fixed!
Hi! I was trying to use LK optical flow and noticed that it didn't give the correct results. Either the flow was too big or its direction was incorrect. Additionally it was dropping a lot of points, marking them as bad.
Bugs:
Upon investigation I've noticed two major bugs:
1. Incorrect pyramid level in the `determine_pyramid_coordinates` function. The indexing starts at `1`, but when computing pyramid point, it should start at `0`, so the first level does not change the coordinates, since it is the original level. 2. In the beginning of the optical flow there is the conversion of points and initial displacements from `(row, col)` to `(x, y)` format. However, all subsequent functions operate in the `(row, col)` format. Simple way to check this is to visualize the gradient windows for the points.
Thanks for addressing these colossal bugs.
Fixes:
Do tell me if you don't agree with some of the changes.
* Use the correct index for pyramid level. * Drop the conversion from `(row, col)` to `(x, y)` and operate only in `(row, col)` format. * Set default value for the `eigenvalue_threshold` parameter to `1e-4` as in [OpenCV](https://docs.opencv.org/4.5.2/dc/d6b/group__video__track.html#ga473e4b886d0bcc6b65831eb88ed93323). * Remove `in_image` function, since `lies_in` is enough. * Convert `while` loops to explicit expressions in `get_grid` function. * Simplify displacement creation in tests. * Use default parameters in tests for Lucas-Kanade algorithm. * Replace tabs in certain places with spaces for consistency purposes. * Make rows no more than 80 characters long & minor refactoring (this allows for split-screen editing upside_down_face).
I think overall these are some excellent changes and I am grateful for the time you put into making them.
If we are going to be changing the code style then I suggest we follow the BlueStyle which I believe already has a lot of traction in the Julia community. One of the key implications is that the 80 character limit would become a 92 character limit.
TODOs:
* [ ] Add epsilon termination criteria as in [OpenCV LK](https://docs.opencv.org/4.5.2/d9/d5d/classcv_1_1TermCriteria.html#a56fecdc291ccaba8aad27d67ccf72c57a857609e73e7028e638d2ea649f3b45d5). * [ ] Add ability to pass precomputed pyramids and image gradients. This can be useful, when computing flow for the image with more than one other image. Since this way we would need to compute them only once. I think about either adding them to keyword parameters, or creating `LKPyramid` struct that will contain both pyramids and gradients. Creating struct will decrease number of parameters in the function, which would make things easier. * [ ] Add tests that fail the current LK version, but succeed on this PR. * [ ] Bring back deleted comments (this was done for simplification).
Those are great suggestions. However, I propose you keep the "ability to pass precomputed pyramids and image gradients." for a separate pull request.
One important issue that needs to be addressed in this PR is an apparent substantial increase in runtime. Using BenchmarkTools I found that the existing code takes approximately 1 second to complete on a pair of test images on my machine, whereas this pull-request takes 5 seconds for the same pair. I believe the issue is that your proposed version allocates memory within performance-critical loops. I think this can be traced down to you removing the use of the custom pinv2x2 function in favour of the built-in pinv function. Revisting this code, I think that we may be better-off using the svd2x2 function directly. This will allow us to construct the pseudo-inverse, but also give us the singular values which we can use to define min_eigenvalue (if I recall correctly, for real symmetric matrices with non-negative eigenvalues, the eigenvalues and singular values are the same).
Thanks again for reviving this implementation.
Hi! Thanks for the review. I've made several improvements and changes based on your comments.
One thing to note is that at the moment of your review I've already implemented version of the algorithm that operates on LKPyramid
struct that holds pyramid for the image and its gradients.
However, I've not exposed the functionality to the user, so from API standpoint it is still the same.
Only underlying functions are utilizing LKPyramid
struct.
I'd appreciate if we can add these changes in this pull request. But we can expose the necessary API in a separate PR.
I've also used svd2x2
to compute minimum singular values, since they indeed coincide with the eigenvalues in this case.
is_truncated_window
.
Use get_grid
to determine whether we need to recompute gradient or can use the previous one.
Since is_truncated_window
always required to recompute the gradients, but between iterations the grid can stay the same (even with the truncated window), leading to redundant recomputations.flow
vector, since it was used only to store pyramid layer contributions and was redundant.svd2x2
to compute minimum eigenvalues instead of calling eigen
(singular values conicide with eigenvalues in this case).LucasKanade
struct as well.Current:
1.555 s (866161 allocations: 260.38 MiB)
This PR:
762.053 ms (7079 allocations: 240.53 MiB)
1.827 s (7091 allocations: 240.53 MiB)
301.647 ms (10 allocations: 10.31 MiB)
Note that without epsilon termination criteria the time is slightly bigger. This can be because current version and this PR are tracking and invalidating different amount of points.
Parameters used:
Current:
1.555 s (866161 allocations: 260.38 MiB)
This PR:
352.698 ms (6839 allocations: 199.00 MiB)
131.568 ms (10 allocations: 10.31 MiB)
5x-10x is a big improvement!
@pxl-th out of curiosity, there're some legacy pending PRs in this repo, are you interested in taking the role and finishing them?
Hi! I was trying to use LK optical flow and noticed that it didn't give the correct results. Either the flow was too big or its direction was incorrect. Additionally it was dropping a lot of points, marking them as bad.
Bugs:
Upon investigation I've noticed two major bugs:
determine_pyramid_coordinates
function. The indexing starts at1
, but when computing pyramid point, it should start at0
, so the first level does not change the coordinates, since it is the original level.(row, col)
to(x, y)
format. However, all subsequent functions operate in the(row, col)
format. Simple way to check this is to visualize the gradient windows for the points.Changes:
Do tell me if you don't agree with some of the changes.
(row, col)
to(x, y)
and operate only in(row, col)
format.eigenvalue_threshold
parameter to1e-4
as in OpenCV.in_image
function, sincelies_in
is enough.while
loops to explicit expressions inget_grid
function.is_truncated_window
. Useget_grid
to determine whether we need to recompute gradient or can use the previous one.flow
vector, since it was used only to store pyramid layer contributions and was redundant.eigen
.optflow!
function operate on theLKPyramid
structure that holds image pyramid and gradients.TODOs:
LKPyramid
struct that will contain both pyramids and gradients. Creating struct will decrease number of parameters in the function, which would make things easier.Visual comparison:
I've used the same code for visualization as in the example docs.
Starting with the simple case, where the figure moves only horizontally.
Car moves forward. Two adjacent frames in a video.
Basketball example. Motion is mainly in the hands and the body. Static background.
Cup. Horizontal movement.