beacon-biosignals / EDF.jl

Read and write EDF files in Julia
MIT License
18 stars 5 forks source link

Method for Obtaining all Timestamps #69

Open stellarpower opened 1 year ago

stellarpower commented 1 year ago

I often need to wrangle between e.g. EDF and multiple different arbitrary and subtly not-entirely-compatible CSV formats (yay for me - hopefully eventually people will begin adopting HDF5 through NWB)

And I'm feeling lazy cause I have to do this so many times over. It would be nice to have one line to extract the timestamps - be they relative to the recording start or in clock time - as an array that I can just e.g. concatenate onto the left-hand-side of the samples array to get a table with the samples for each channel and the time these were recorded.

I presume that it's just a case of simply taking the start date and the sample rate and extrapolating form there, but maybe there are some corner cases that should be handled aswell.

Thanks

ararslan commented 1 year ago

I think this is the primary case you'd need to watch out for: https://edfplus.info/specs/edfplus.html#timekeeping. Otherwise what you described is how I'd do it.

Shameless company promotion but you may be interested in Onda and OndaEDF.

stellarpower commented 1 year ago

Cool - I've got a relatively basic implementation for now but I may come back to this. I can upload it at the least if you wanted to work from there, no that there's much to it at all. Hit a bit of a stumbling block in that from what I can see Julia's DateTime is limited to an integer number of milliseconds, which is going to be an issue at 220 or 250 or 256 Hz, so rolled my own.

I saw Onda earlier (as a Spanish-speaker the name caught my attention if nothing else!), but took me a little bit of time to work out what it was doing from the Readme. Is the LPCM in essence just time series then? I asumed it was then relating to something tied to hardware/electronics and more technically specialised than I'm likely ot have a use for.

I may well end up writing an all-in-one tool for biodata at some point - I'm sick of all these stupid scripts and incompatible standards. A toolbox for converting between all file types, streaming data live from a device, or playing back from file, using LSL and OSC and possibly a custom faster IPC mechanism for data one machine, realtime and offline plotting, basic filters, denoising, you get the idea. I haven;t read all about it but I can well believe NWB is better than EDF is much better than janky CSV, but that covers offline storage. If I go ahead with this then I'd need a pretty rigourous description for storing data from all formats in-memory, and may need to integrate this with low-latency streaming etc. So I will probably take a look. It looks like it's mostly in Julia for now but if there were a package implementing the spec in C++ at some point then I'm intrigued.

palday commented 1 year ago

Is the LPCM in essence just time series then?

Essentially. LPCM stands for linear pulse code modulation. Linear means i.e. no fancy scaling with logarithms (e.g. dB) or square roots (e.g. RMS) and that the steps between quantization levels are uniform. "Pulse code modulation" means that the data are sampled regularly at uniform time intervals and that the the value at each timepoint is stored. Regular sampling has many pros, but there are also a few cons (requires a relatively good and stable clock, can be quite energy intensive, is actually the worst sampling regime for things like compressed sensing). Finally, it's the (im)pulses that are stored, i.e. the instantaneous value and not e.g. the difference to the previous value. This makes random access easier, but it also limits dynamic range and depending on how the measurement itself is performed (instead of just stored) may also have certain impacts for sensitivity to signal and equipment drift.

I think Onda is actually a pretty good intermediate format for what you're describing. If you want to stay in a BIDS compliant world, then the BrainVision format (at least the default "vectorized" storage; the "multiplexed" stuff is a bit different) actually stores the signal data in an Onda-compatible way; it's just the meta-data that are stored differently.

If you really want to do EDF in C/C++ -- you know about EDFLib, right?

stellarpower commented 1 year ago

Thanks for the explanation - I'm still trying to understand what it is that the package does a bit however.

So when I get the data off my board, each frame contains 32 doubles, where the value of this should directly be the voltage measured over the sample interval - AFAIK it's linear as you describe, and a value of 100.0 means the ADC is reading 100μV between the two terminals. At least if there is any transformation, it's happening upstream in firmware or something like that - so I expect to receive a linear value and in applicable units.

So when you describe the details of the encoding as PCM, it is conjuring up a picture of something much closer to the hardware in term of how the data are collected and sent over the serial port to my PC collecting the data. Or is it just that the format that Onda is describing on top of Arrow is suitable for data sampled at regular time intervals, but not some of the other sorts of time-series data? AFAIK what I am getting off the board fits that description, I've just never heard it described as PCM (outside of audio) or discussed to that level. I'd assume regularly sampling at the same frequency and linearly storing the voltage would be the simplest default way to handle biodata in most situations.

If you really want to do EDF in C/C++ -- you know about EDFLib, right?

I do, thanks. I may also lift some of the plotting from the EDFBrowser program. For now, priority is on speed, so I am just copy-pasting sample scripts and modifying from there. But if I end up writing this all-in-one toolbox, I would most likely write a library that provides a homogeneous interface over all of these different formats, an be calling into the "canonical" implementations in C/C++ where applicable. Although I know EDF.jl isn't just exposing bindings itself.