Aharoni-Lab / miniscope-io

Data formatting, reading, and writing from miniscopes
https://miniscope-io.readthedocs.io
GNU Affero General Public License v3.0
6 stars 2 forks source link

Variability in video output #32

Open t-sasatani opened 3 months ago

t-sasatani commented 3 months ago

The hashes generated by the hash_video method in miniscope_io.utils still seem variable depending on the environment. You can see this in the test runs based on the following commit:

The tests on GH Mac and Win failed, whereas tests on Ubuntu passed. It also passed in my local Windows environment, so it isn't completely OS-dependent. The hash output is variable, but there seem to be only two variations.

I couldn't figure out why and skipped this error for now by holding multiple reference hashes in the test.

sneakers-the-rat commented 3 months ago

Heh yes we have gone full circle - I experienced this with the sdcard class too and had a very similar solution.

Ultimately it just came down to the fact that the lossy compression seems intrinsically variable, I think at least some part of the algorithm is stochastic. So what I did is constrain the test to always use the GRAY fourcc code https://github.com/Aharoni-Lab/miniscope-io/blob/f11d9018e8bde5587664c0575bc51cec2a7e1892/tests/test_io.py#L125

We should just suck it up and just make an ffmpeg-based writer class, where we'll have way more control over video output than with opencv. We can basically borrow the classes from skvideo: https://github.com/scikit-video/scikit-video/blob/master/skvideo/io/ffmpeg.py

scipy is a pretty heavy dependency to take on, so we might want to just rip those classes off since theyre all we need and they are such a thin wrapper around ffmpeg

t-sasatani commented 3 months ago

I'm confused because the Y800 codec, the default in streamDaq, is also lossless. Also the hash_video() method you wrote looks like it shouldn't have writer-dependency as long as the contained video is lossless. So I wasn't sure if switching to something more explicit like ffmpeg would solve this, but it might be worth a try.

sneakers-the-rat commented 3 months ago

I dont know much about the Y800 codec, but it might have some nondeterministic component too. Since we dont have a lot of visibility into the format we'll probably always be guessing.

These end-to-end tests that test that a video stays unchanged are mostly a placeholder for better, more granular tests anyway - eg. Here we should just be pulling the frames themselves out and hashing those rather than going through a video encoding step, have steps for the intermediate steps too, and test video writing separately, but thats all in due time. The naturalistic binary file as test data is great for testing error handling, but we also should be generating and feeding through known video frames to test as well.

So re: this issue, maybe try it with GRAY, and we can use that as a stopgap until we get something better worked out. It should be possible to pass a fourcc code through the capture method to the writer

t-sasatani commented 3 months ago

Y800 just stores each pixel in each byte, so it should be deterministic. Actually, GREY seems to be a duplicate of Y800 so let's just leave it for now and put those more rigorous tests in the backlog. There is a chance GREY will generate a single hash in our environments but it feels better to keep the problem apparent and exposed at this point. https://fourcc.org/yuv.php#Y800

sneakers-the-rat commented 3 months ago

Haha omg exactly yes lets move our video output to something thats like a set of parameters instead of opaque four character aliases. Agreed on leaving this open