Undocumented segments in Blackrock nev data cause downstream AssertionError

NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats

http://neo.readthedocs.io/en/latest/

BSD 3-Clause "New" or "Revised" License

312 stars 243 forks source link

Undocumented segments in Blackrock nev data cause downstream AssertionError #1449

Open daltondm opened 3 months ago

daltondm commented 3 months ago

Problem I have encountered a problem related to undocumented segments in Blackrock NEV files. In the file blackrockrawio.py, function ____get_event_segment_ids, the code identifies 2 undocumented segments within NEV data and outputs a warning but adds segment IDs to address the problem.

blackrockrawio.py:1164: UserWarning: Detected 2 undocumented segments within nev data after timestamps [1209981 1256384]. (This would be line 1157 in the latest version)

Later on, in function __match_nsx_and_nev_segment_ids, the final assertion that NSx and NEV files have the same number of segments fails (because nb_possible_nev_segments = 6 and len(nonempty_nsx_segments) = 5).

AssertionError: Inconsistent ns6 and nev file. 6 segments present in .nev file, but 5 in ns6 file. (Line 1266 in latest version.)

I am wondering if this is a bug in the way that undocumented segments are handled, or whether this issue requires a separate workaround.

To Reproduce I'm not sure the simplest way to reproduce the error, as I am encountering the error through the chain of neuroconv --> spikeinterface --> neo. However, I could share the files privately if that is helpful (about 2.5GB in total for .nev and .ns6 files).

Expected behaviour I expected the segments in the NEV file to automatically be adjusted and mapped to ns6 segments.

Environment:

OS: Linux
Python version: 3.11.5
Neo version: 0.12.0 (I am suprised by this, obatained with neo.version, since the latest version on github is 0.11.1)
NumPy version: 1.26.2

zm711 commented 3 months ago

@daltondm,

Thanks for the report. The latest pypi release is 0.13.0. We have stopped keeping the GitHub versions up to date. But the pypi version is uptodate. So first thing to try is updating neo to 0.13.0 with a

pip install -U python-neo

If the code still doesn't work after updating to the most recent release then we would need to see the new files. We only update the readers as people share new files. Typically an error like that is to let the user know that we haven't created code to handle that part of the file yet due to a lack of test files for us to debug with.

Sound good?

daltondm commented 3 months ago

Thank you for the quick reply. I updated the version to 0.13.0, although my version of the package is just neo rather than python-neo (not sure if that's an issue?): pip install -U neo

Still seeing the same issue. I'm happy to share the nev and ns6 files with you, what is the best way to do that?

zm711 commented 3 months ago

@samuelgarcia any interest in taking this on. Looks like you were the last to work on this. If you don't have the bandwidth I could work on this, but I'm not super familiar with Blackrock.

@daltondm, I can share an email with you if Sam doesn't have the time to work on this. We will let you know soon where to send to. Since they are big it is usually best to share a googledrive or dropbox link.

zm711 commented 3 months ago

Hey @daltondm one quick question.

Are you just the computational person and so don't have access to make a dummy file? We could do the same troubleshooting faster with much smaller files. But if you're processing other's data and don't have access to give us a smaller file then we can troubleshoot with the bigger files.

daltondm commented 3 months ago

@zm711 I have access to the raw files, but I'm not sure how to make an appropriate dummy file. The section that matches nev and ns6 segments uses variables that cover the entire recording time, including a segment ID variable with entries for each timestamp (also, I don't know how to go about altering and re-saving dummy versions these filetypes).

zm711 commented 3 months ago

@daltondm, I meant it would be much easier if you could do a brand new recording with your set up (for 1-2 seconds) and then share that instead. It won't have real data in it, but hopefully it would have the same problem structures. It's just much faster to iterate on small files than on multi gigabyte files. And if it is dummy data/unimportant data then you could just share a link so anyone on the team could pick it up.

daltondm commented 3 months ago

Ahh, I understand. Unfortunately for debugging (but fortunately for my team), this is not typical of our recordings so a new recording would be unlikely to replicate the problem (I have 2 recordings with this issue and 1 with a similar but distinct issue out 100s of recordings).

zm711 commented 3 months ago

@daltondm You can use this email mineurs-torrent0x[at]icloud.com. To share the data privately.

daltondm commented 1 month ago

@zm711 I apologize for the delay. I have shared the data to the email provided above with a Google Drive link. Please let me know if it needs to be shared another way or if you need any more information. Thank you!

zm711 commented 1 month ago

Thanks @daltondm! No worries. I received it and will try to take a look tomorrow.

zm711 commented 1 month ago

I figured out the issue, but I will need a little more info to try to fix it. It seems like there was a pause in your recording that led to this:

k='Comments';

data=array([(26250083, 65535, 0, 1, 10008427, b'Data Loss, 1607515104 packet discrepancy.')],
      dtype=[('timestamp', '<u4'), ('packet_id', '<u2'), ('char_set', 'u1'), ('flag', 'u1'), ('color', '<u4'), ('comment', 'S92')])

ev_ids=array([0])

So packet loss. Currently the code is adding another segment for this loss. I'm trying to figure out if this is an edge case or something we need to fix. Why is a comment being added? Did you do this at the user level or did the software automatically do this?

Did you shut off the machine right after this error such that there was not actually a segment of data after this comment? It seems like the error may be protecting the fact that there was a pause/restart. We try to account for this so the more information around this pause restart the better we can decide how to fix this.

daltondm commented 1 month ago

I checked the notes from the person who ran this session, and they noted a machine failure which could be explained by packet loss. They fixed the issue by restarting the machine and beginning a fresh recording. I'm not sure how quickly they were able to catch the issue and reset the machine, or if it automatically ended the recording after the comment/error. My intuition is that the software automatically terminated the recording and there is no data afterward, but not certain.

I know that the person would often use the comments feature in the software to make timestamped notes on the recording, but these were usually about the observed behavior or signal dropout from the wireless system - I suspect the comment here is automatically generated by the software but again I'm not certain.

This is rare in our recordings and doesn't impact our data much becuase it happened early in the session. However, if it were to happen at the end of a full recording it would be useful to be able to catch the packet loss comment and adjust the number of data segments accordingly.

Thanks for the quick turnaround!