Tests for streamDaq / video export

t-sasatani commented 2 months ago

I drafted a simple test module for the streamDaq because it's starting to take shape. This is pretty preliminary and might be a bit early for reviewing, but I'd appreciate any feedback. Updates are listed in the following:

Add FPGA RAW data recording and importing mode
Add video write-out function (to avi) and a handle to turn on and off video saving/display
Added a primitive Pytest module and confirmed that it works locally (by running normally/intentionally breaking it)

Here is some stuff I'm wondering about and will think about later:

How to switch between normal and test operations nicely. I guess the current way is fine but the config file and model is getting a bit non-intuitive.
Asserting metadata. The log output has time in it, so I guess we want some other metadata export, also because we'll need metadata anyway. I can quickly assemble a CSV export for tests, but it would be better to have it compatible with previous Miniscopes if possible.

t-sasatani commented 2 months ago

The added pytest module goes through local tests but fails on GH actions, so I'll look into it later.

sneakers-the-rat commented 2 months ago

Oh this rocks. Ill do a review in a sec but I definitely know how to do this now that we have sample data and expected values up and running :)

sneakers-the-rat commented 2 months ago

since the PR comes from a fork and not from a branch I can't push to it afaik, but see here: https://github.com/Aharoni-Lab/miniscope-io/commit/615eb344ec6e27ec4b0e69dd191d70b56ba7090f

uses a mock instead of modifying class: this keeps the test consistent with in the wild behavior and also requires less special-casing.
rather than including the output file, we can instead hash the file created during tests and compare it against a known set of hashes - videos created on different platforms will be different from one another, and so this way we can keep the repo small and also handle those cases
I also split out the video writer stuff into its own method and removed from config - we don't necessarily want to configure everything from the yaml file, and that seems like a pretty good example: if we wanted to split up the model and have a separate Capture submodel that could hold those things, then that would probably be fine, but we also want to avoid these models turning into "junk drawers" where we just keep adding fields indefinitely. similarly to previous model refactors, other devices will probably want to have video output, and we should create a common set of tools for that instead of each having their own.
removed some zombie code as I came across it

sneakers-the-rat commented 2 months ago

we'll definitely want to redo these queues because they seem to be pretty slow - i would expect processing 118 frames to be more or less instantaneous but it takes ~15s (longer than the video), so that's def a bottleneck for real usage

t-sasatani commented 2 months ago

Thanks @sneakers-the-rat for the review and changes! I merged your branch and also added you to the fork in case. I'll look through the comments and address remaining issues within a few days.

sneakers-the-rat commented 2 months ago

there's a lil more that could be done but ya i think it's starting to take shape! time to start making some structural changes to start moving it towards the kind of user interface/usage patterns we want for an sdk like this :)

t-sasatani commented 2 months ago

I was curious why multiprocessing.set_start_method('fork') passes tests on Windows and noticed all tests are now on Linux. I updated the workflow but it might be an option to narrow down OSs for now.

t-sasatani commented 2 months ago

This might be something stupid, but I'm confused why conftest.py doesn't seem to patch okDev when running pytest locally (log below running with latest commit).
The Win/Mac tests continued for about 20 minutes until I halted manually. I don't know where they were stuck, but we might want to add timeouts for them.
I added a 10 s timeout to the streamDaq test to know where it's stuck, but I noticed this doesn't make sense now the DAQ is slower than that.

(miniscope-io-py3.11) PS C:\Users\Takuya\GitHub\miniscope-io> pytest .\tests\test_stream_daq.py -v --log-cli-level=DEBUG =============================================== test session starts =============================================== platform win32 -- Python 3.11.6, pytest-7.4.4, pluggy-1.5.0 -- C:\Users\Takuya\GitHub\miniscope-io.venv\Scripts\python.exe cachedir: .pytest_cache rootdir: C:\Users\Takuya\GitHub\miniscope-io configfile: pytest.ini plugins: anyio-4.4.0, timeout-2.3.1 collected 1 item

tests/test_stream_daq.py::test_video_output[stream_daq_test_200px.yml-stream_daq_test_fpga_raw_input_200px.bin-hash0] +++++++++++++++++++++++++++++++++++++++++++++++++++++ Timeout +++++++++++++++++++++++++++++++++++++++++++++++++++++
Process Process-2:
Traceback (most recent call last):
  File "C:\Users\Takuya\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 314, in _bootstrap
    self.run()
  File "C:\Users\Takuya\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 108, in run   
    self._target(*self._args, **self._kwargs)
  File "C:\Users\Takuya\GitHub\miniscope-io\miniscope_io\stream_daq.py", line 241, in _fpga_recv
    dev = okDev()
          ^^^^^^^
  File "C:\Users\Takuya\GitHub\miniscope-io\miniscope_io\devices\opalkelly.py", line 25, in __init__
    raise DeviceOpenError("Cannot open device: {}".format(serial_id))
miniscope_io.exceptions.DeviceOpenError: Cannot open device:

sneakers-the-rat commented 2 months ago

noticed all tests are now on Linux.

Ack good catch!!!

Yes monkeypatching something in a separate process is a little tricky and it depends on how the environment is created. I think I might need to add another method that hooks into the import machinery bc basically the behavior we want is "at test time, in all cases swap out okDev with the mock" even when a new process is spawned rather than forked

sneakers-the-rat commented 2 months ago

You're probably gonna hate me for this but I think we should fix the process termination in the capture method instead of adding a timeout in tests 🙈.

I think it'll be worth it - basically make a shared Event that controls the loop for each of the subroutines, and if any of them throws an exception they can flip the event and all exit cleanly.

More generally I think we may want to take those out of processes and just turn them back into a single linear loop for now, bc we dont know exactly what should be parallelized yet without careful profiling, as is two of the processes are bottlenecked and might as well be done in serial. That will probably help with complexity as we get this thing on its feet, and we can add multiprocessing back in once we know where we need perf

t-sasatani commented 2 months ago

Yes, I think we need Event for normal-ish operation anyway, and I think I'll have time to work on it this weekend, but I mentioned timeouts for something that kills the process from outside when even these get stuck.

I think we may want to take those out of processes and just turn them back into a single linear loop

We can think about this later, but for now, I'll just look into apparent causes because I don't think I can make time to experiment for a while.

If monkeypatching becomes complex with multiprocessing, I think we can also limit to fork and UNIX-based systems for now. (I honestly think we can even temporarily move forward by directly plugging in a mock instance, though I know you'll hate that.)

sneakers-the-rat commented 2 months ago

We can think about this later

Agreed. Lets make this pass, clean it up a bit, merge, and try that in a separate PR.

directly plugging in a mock instance

doing a

if os.environ.get("PYTEST_VERSION") is not None:
    dev = okDevMock()
else:
    dev = okDev()

would be an OK, temporary hack while we work out multiprocessing - the main thing is to avoid making it so the test conditions are substantially different than the runtime conditions, which that wouldnt really do, and secondarily to keep the code tidy by not having too much testing stuff in the package itself. I wouldnt want it to stick around too long bc we have other devices to implement and we will want a generalizable mocking pattern for testing those too, but if we cant figure out how to monkeypatch in a spawned process then I think thats a bit less fragile than trying to force an OS to use a poorly supported mp mode.

t-sasatani commented 2 months ago

I made some small updates because I had a minute. Now, this pytest module passes on my local Windows PC but not yet on GH (Linux: assert in stream DAQ test isn't passing; Win/Mac: many other failures, too.). I'll work on the rest after tomorrow.

Made the test code generate reference data hashes because that seems less fragile and also convenient. Please revert if we shouldn't do this.
Plugged in the Mock module directly. Also, updated the reference video because having the read length argument in the mock changes the output a bit.

Also, I will mark this ready for review because I guess it's ready for review.

sneakers-the-rat commented 2 months ago

Made the test code generate reference data hashes because that seems less fragile and also convenient. Please revert if we shouldn't do this.

The purpose of using the hash is to not have to keep a reference video in the repo, just the hashes - since the hash of the video changes based on platform (video codec differences and etc.) Having several hashes that the generated video can match is one way to accomodate that. To avoid that we would probably want to encode the video as raw which would also avoid compression inconsistencies.

sneakers-the-rat commented 2 months ago

hang on i've got this working by hashing the decoded video rather than the file, since the file will probably always have some variability both across platforms and runs.

sneakers-the-rat commented 2 months ago

bear with me lol we can get this working - we'll want this for later bc we'll eventually have a lot of end to end tests and we don't want to have like 100MB of videos in the repo

t-sasatani commented 2 months ago

Thanks, Jonny. I had no idea there could be ambiguity between runs. I thought having uncompressed videos would just approximately double the FPGA RAW data amount, so we could get away with that worst-case scenario, but I guess that wouldn't work if we're doing longer mock data generation and testing. Also, I'm curious how generating hashes for multi-platform and enough runs can be made easy.

Anyways I'll hold off for a while.

sneakers-the-rat commented 2 months ago

ok sorry for the big squash commit there from https://github.com/Aharoni-Lab/miniscope-io/tree/tests_hash_video - re-activating the windows tests (thank you!!!!) exposed a bunch of path errors that were sort of separate from anything we were doing here. fixed those.

So what i did is i made a thing to use the decoded video as the basis for the hash, rather than the encoded bitstream of the video file. that seems to have evened out one of the sources of inconsistency. but it also seems like divx and x264 are both nondeterministic in general, so we just can't use them when we do video equality tests. we can have separate tests for codec support if we need them.

sneakers-the-rat commented 2 months ago

i think technically this should be a minor version bump but we've been using patch for units of work this size... works for me, maybe in 0.2 we start incrementing minor versions ;)

t-sasatani commented 2 months ago

Thanks for all of the updates! I'm just skimming through the code, but it was nice to learn stuff like how you handled this ambiguity with the utils.hash_video method.

You know best about versioning, so whatever is good for you is good. The last few updates advanced the streamDAQ part pretty much, so it makes sense from my end to call this a 0.2.

t-sasatani commented 2 months ago

Also, I can test the operation with hardware by early next week (it's the first time mocking this HW, after all), but I guess we can go ahead and merge this before that. If we need some kind of fix, that can be a 0.2.1.

sneakers-the-rat commented 2 months ago

ok fixed some few errant bugs, resolve the last changes, i think i'm gonna merge and publish this!!!!!

Aharoni-Lab / miniscope-io

Tests for streamDaq / video export #26