Discussion: prefix design and matching

sneakers-the-rat commented 2 months ago

i think we've talked about this in a few issues and in the slack but i don't think we have a single issue for this - chattin about what we want for recognizing boundaries between frames.

detecting frames currently relies on a match between a prefix value:

Error characterization:

The current problem i know of is that there is some known rate of bit corruption that makes the prefix not a perfect match. but plz list any others. Knowing exactly what kind of corruption we need to be robust to would probably go a a long way - eg. if it's just bit flips we could do a convolution, but if there are insertions/deletions it wouldn't work so good.

Constraints:

small: want to not spend all day on the preamble
specific: no false positives in the data
sensitive: tolerant to mild corruption
?

Strategies:

Fuzzy matching: sum of an XOR above some value? pull in a levenshtein?
Range guessing: currently we just search for any match in the string, but we could allow for even more fuzziness and make the search faster by seeding searches in the place we expect the next preamble to be in the bitstream (assuming 'packets' are uniform length?)
Repeating preamble: the preamble is currently 0x12345678, is there some preamble that is empirically worse or better than another, or is every preamble equally likely in the data? i also wonder if we can use a repeating preamble, eg. 10101... so that we can scale it up and down depending on the quality of the signal, as well as set a threshold in miniscope-io? so like "treat any sequence that ends with 4 copies of the preamble as a buffer, if we detect too many, increase that threshold, if we detect too few, decrease it"

there is also probably some encoding magic like that manchester encoding dummy value thing that @t-sasatani just pulled in the SAMD framework that i am not aware of, so plz fill in the gaps here and i can edit them into this top comment until we come up with a plan

also @MarcelMB @phildong

t-sasatani commented 2 months ago

I think the error situation will change pretty much with the dummy word thing, so we'd want to collect data with this update and observe what fuzzy we need. It also might be better to stay rigorous with the preamble first and start fuzzy with the headers because the header data are relatively less coupled with firmware.

I need to leave my bench for today, but @MarcelMB and I will have to collect data anyway. So we can upload binaries with the updated thing while doing that.
We don't know how broken preambles look yet because they aren't extracted in the current code. We might want to add a debug feature for this.
Some preambles will be significantly worse than 0x12345678, but there shouldn't be significantly better ones.
If we want to do the repeat thing, we can start with repeating 0x12345678 as well. It's not really bandwidth efficient, but adding ten more 32-bit words in front of a buffer shouldn't really affect anything. If you want to make that shorter, I think [0, 255, 0, 255, ...] would be a decent choice because it wouldn't show up in sensor data and also wouldn't cause weird problems with clock recovery (it'll take pretty long to explain this; so let me do this some other time).

MarcelMB commented 2 months ago

I have some files from the most recent commit 443200d66fe4029bbe2df9fb353655075f2fbd3a

Drive link

Some buffers will be corrupted because I was updating the firmware via ATMEL ICE connected to the daughterboard multiple times while recording. But could be also nice to test since we will do that anyway for in vivo test to update the firmware needed for excitation light changes, ROI shift etc.

sneakers-the-rat commented 2 months ago

Side note - we definitely should make it so ya dont need to modify the firmware to set those values, but can set them via miniscope-io, ill make a separate issue for that

MarcelMB commented 3 weeks ago

I am down working on this and implement a combination of strategies as @sneakers-the-rat proposed. Let me know when you want to start. I would join on this.

we could easily repeat the preamble a few times.
add XOR or hamming distance to add some tolerance in identifying preamble
detect error in a preamble like simple checksum and move on to next one
predict location of preamble with range guessing (should be easy since we have kind of fixed buffer sizes)

I would still like to work further on error correction for the whole preamble (and or data) later on. But the preamble could be a start.

sneakers-the-rat commented 3 weeks ago

I think for this the first thing we should do is clean up the code a bit first - split up each of the acquisition methods that bundle together pulling from a queue and running in a subprocess and etc. into separate pure functions, then it should be a lot easier to write and test functionality that only affects one stage (i.e. the initial identification of the start of a buffer from a continuous bytestream) without the others. as-is it's a little challenging to access that code in tests/without mocking up the full streaming situation, so we ideally get to a place where we can just pass a bytestring and get back buffer(s).

This will require a bit more statefulness than a pure function because we want to take advantage of being able to 'go backwards' and remember positions in the recent past, but in either case we should split that out from the StreamDaq class because it's getting a bit overloaded

MarcelMB commented 3 weeks ago

split up each of the acquisition methods that bundle together pulling from a queue and running in a subprocess and etc. into separate pure functions

not sure I am understanding it all correct. So you suggest cleaning up streamDAQ and put certain process into functions that live outside?

sneakers-the-rat commented 3 weeks ago

Yeah yeah, because we'll need to make that more flexible as we add more devices anyway, eg. The ephys stuff should be able to stream but it won't be reassembled into images. And currently the design of them being locked into being run as a separate process that pulls from a queue makes them pretty hard to reuse.

Shouldnt be too hard. Ill get on it

Aharoni-Lab / miniscope-io