Any Example Video In/Outputs?

Schizo commented 3 years ago

I wasn't able to tell from the first glance if your metho is keeping temporal coherency, do you have an example video in/output?

NPN commented 3 years ago

Temporal coherence is currently implemented in the temporal_coherence function. I haven't posted examples yet because the project is very unfinished (e.g. output is black and white, vertical spatial coherence/piecewise seams are not implemented) and I have been focusing on GPU performance problems. I hope to post examples when the project is done, though that may be far off.

Anyhow, here is an example using a clip from Grundmann et al.'s dataset. The original video (88_2_orig.mp4) is 480px wide. The carved video (88_2_carved.mp4) is 70% of the original width (144px removed). The top is using TC, and the bottom is not. You can see that the bottom is far more jittery. The top is still jittery, which could be due to many things: bad saliency/coherence weights, greedy seams instead of DP, no vertical coherency, using luma instead of RGB distance, etc.

https://user-images.githubusercontent.com/1497826/109476069-5ec6a500-7a6e-11eb-986d-7474a480a36c.mp4

https://user-images.githubusercontent.com/1497826/109476001-4d7d9880-7a6e-11eb-9f87-0e7ef61f1746.mp4

Schizo commented 3 years ago

Thanks that is very useful. Do you know how this compares to the "Improved seam carving for video retargeting". The reason I'm interested in this project is that I want to use it for carving volumes that are static. So far I just implemented the naive 2D Approach but considering every frame as another channel. I didn't yet have time to implement the actual paper that is computationally intensive, but looking at the paper that you are referring, might be actually a good solution. However I don't have a time complexity constrain.

https://user-images.githubusercontent.com/38006/111059653-1afb7480-844c-11eb-8f56-d6c25a830a0a.mp4

https://user-images.githubusercontent.com/38006/111059666-39fa0680-844c-11eb-9ed0-46d8c20e4afa.mp4

NPN commented 3 years ago

Sorry for the late response. Are you using seam carving to shrink 3D models? That's interesting. I've only seen seam carving used on images/videos before, so could you explain in more detail what you're currently doing and what you want to achieve?

From what I can see in the videos, are you taking a 2D image of the model, seam carving that, and then projecting the squashed image back to 3D? But, what if a seam is diagonal? It seems that the model would become lopsided. Are you only removing horizontal seams?

As for comparing my implementation (mainly based on "Discontinuous seam-carving for video retargeting", by Grundmann et al.) to "Improved Seam Carving for Video Retargeting", by Rubinstein et al., first note that the problem both papers are trying to solve is that you can't just seam carve each frame independently. This leads to a jittery result, similar to the lower half of my video above.

Rubinstein et al. solve this by imagining the video as a big cube with two image dimensions and one time dimension. Then, to remove one seam from each frame, you carve a surface through this cube:

Figure 5: The intersection of every X×T plane with the seam surface defines a spatiotemporal seam.

The smoothness of the surface means that seams cannot jump from one place to another. If they did, the surface would be broken into pieces, which is not the case.

The main drawback of this approach is that it is slow. Finding just one optimal seam requires running an algorithm over the entire video. Since "the computation time...is quadratic in the number of voxels" (4), this quickly becomes infeasible. (Indeed, Rubinstein et al. have to first find an approximate seam, and then refine it, for the algorithm to run in a reasonable amount of time.)

Grundmann et al. take a different approach. Suppose we've found the optimal seam for frame 1 and we now want to find the optimal seam for frame 2. Rubinstein et al.'s approach would be to pick a seam which is very close to the seam in frame 1. After all, seams that don't jump around will prevent shaky video.

The key insight of Grundmann et al. is that you don't actually need to pick a seam close to frame 1's seam. You only need to pick a seam that makes it look like you picked such a seam. In other words:

Let A be frame 2 carved with frame 1's seam
Let B be frame 2 carved with some other seam S

If we can find a seam S such that S is a good seam (according to the saliency score) and B looks very similar like A, then we have found a seam that is both good and temporally stable. Rubinstein et al.'s method will always find a stable seam, but that seam may suck otherwise. By allowing our seam to be discontinuous (i.e. jump around), we can find the best seam in frame 2 while preserving stability.

Well, technically S is only a seam which produces a result that appears temporally stable. But, as Grundmann et al. show, that's often good enough.

The whole point of all this is that Grundmann et al.'s algorithm only requires looking at two frames at once. This is faster and allows you to perform streaming seam carving, since you don't need to process every frame at once.

Grundmann et al.'s paper does have more stuff, but that's the main gist. To sum it up, both papers do roughly the same thing, but Grundmann et al. have a much looser constraint (seams that appear stable) compared to Rubinstein et al. (seams that are geometrically stable). This results in a faster algorithm.

I don't know if you actually wanted such a long explanation, but I hope it was helpful anyway.

NPN / carve

Any Example Video In/Outputs? #1