MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
738 stars 300 forks source link

Set correct checksums and samps_per_frame in Record.wrsamp #424

Closed bemoody closed 1 year ago

bemoody commented 2 years ago

There are a couple of ways to write a record using this package. One is wfdb.wrsamp, the other is creating a Record object and calling the wrsamp method.

Record.wrsamp was, I think, meant to be more low-level and perhaps only intended for internal use of the package. But for various reasons, applications may want or need to call this method directly. So I think we need to support it, and make it more fool-proof; in particular, this method shouldn't generate invalid WFDB records even if the application supplies wrong parameters.

Here I address two of those parameters - checksum and samps_per_frame.

Typically, checksum is set by wfdb.rdrecord. But naturally, the input data may have been modified (explicitly by the application, or implicitly e.g. by smooth_frames) between reading the record and writing it. So when calling Record.wrsamp, we want to set this field to the actual checksums of the signals. For backward compatibility and for applications that want to edit an existing header file, we leave the checksums alone if they're absent or if they're already correct.

(WFDB checksums are stored and written as an integer but only the low 16 bits are used. So values of -1 and 65535 are equally correct. Yes, this is kind of broken.)

samps_per_frame is only relevant when writing multi-frequency data. If you are writing single-frequency signal files, then the header file must not claim there are multiple samples per frame. Since the field is optional, we can simply drop it when expanded is false.

(If you are writing multi-frequency signal files, then Record.wrsamp does correctly verify that samps_per_frame is consistent with the dimensions of e_d_signal.)

Fixes issue #333.

tompollard commented 1 year ago

Thanks @bemoody, basically this looks good to me, though I don't totally understand the purpose of the expanded flag. Is it true that only the expanded formats (e_d_signal and e_p_signal?) require the samps_per_frame value in the header?

Not directly related to this pull request, but I think the addition/removal of fields in the header makes them less human-readable. I'd prefer consistent fields that are sometimes populated with something like "NULL" than fields that disappear when not required,.

bemoody commented 1 year ago

If samples-per-frame is omitted from the header file, it defaults to 1, which is what we want when writing d_signal or p_signal data.

Multi-frequency was a fairly late addition to the format, which is why the samples-per-frame field in the header is optional. If I were designing a format or an API now, multi-frequency support wouldn't be optional.

tompollard commented 1 year ago

Makes sense, thanks Benjamin.