I've recently needed to read files from buffers rather than files (streamed from file selection pickers). I've opened up a Pull Request with a quick fix allowing a buffer override: #491
However, a simple buffer override is suboptimal. Ideally, there are as few diverging paths in the code as possible. Additionally, it makes sense to separate concerns of parsing and I/O, because being agnostic to the buffer source enables flexibility and decreases complexity on both sides. Hence, merging this PR is much preferred to #491.
This Pull Request implements my recommendation for a restructure of rdrecord, rdheader and rdann. Both rdheader and rdann were easy to modify, but rdrecord's ability to read multi-segment records does not easily translate to reads using streams. Instead, I decided to only streamline _signal._rd_segment to use buffers. Someone needing to read data from buffers must therefore implement surrounding functionality, such as pre-reading the header or acting on multi-segment records, themselves. Still, streamlining the functions addressed in this PR already facilitate accomplishing this goal at all, without needing to re-write all of the parsing functionality of wfdb.
Before Merging
This pull request needs:
Functionality tests:
[ ] public api / local data
[ ] public api / physionet data
[ ] private api / streamed io data
[ ] Updated documentation
Concerns and observations
Previously, the header parser differentiated between remote and local headers. The new implementation makes it impossible to distinguish between the two. I would appreciate input on whether this differentiation was intentional. If so, we can allow an argument to _rdheader specifying the encoding.
Remote headers were decoded using iso-8859-1
Local headers were decoded using ascii
I'm also open to _rdheader accepting a pre-decoded string, since the file is plaintext. I advocate a binary input though to avoid inconsistent coding behavior.
rdheader previously ignored file read errors for local files. I would like input on whether that was intentional, because a file read error could result in corrupted file reads.
rdheader previously allowed passing a directory even if pn_dir was supplied. In this case, the file_name directory was cut from the record name and silently ignored. I don't think this is the right call, so I cut this functionality. Let me know if there's a reason to keep it.
Some buffering arguments are, as of now, removed. It should be easy to re-introduce them into the new enclosing functions, if needed.
This PR removes a lot of now unneeded private functions. While it's unlikely (and discouraged due to internality) they were used before, it could break dependents relying on them.
The no_file parameter could be deprecated, instead using a None test against sig_data. This removes a potential failure case because it's unlikely someone passing sig_data doesn't intend it to be used.
I've recently needed to read files from buffers rather than files (streamed from file selection pickers). I've opened up a Pull Request with a quick fix allowing a buffer override: #491
However, a simple buffer override is suboptimal. Ideally, there are as few diverging paths in the code as possible. Additionally, it makes sense to separate concerns of parsing and I/O, because being agnostic to the buffer source enables flexibility and decreases complexity on both sides. Hence, merging this PR is much preferred to #491.
This Pull Request implements my recommendation for a restructure of
rdrecord
,rdheader
andrdann
. Bothrdheader
andrdann
were easy to modify, butrdrecord
's ability to read multi-segment records does not easily translate to reads using streams. Instead, I decided to only streamline_signal._rd_segment
to use buffers. Someone needing to read data from buffers must therefore implement surrounding functionality, such as pre-reading the header or acting on multi-segment records, themselves. Still, streamlining the functions addressed in this PR already facilitate accomplishing this goal at all, without needing to re-write all of the parsing functionality of wfdb.Before Merging
This pull request needs:
Concerns and observations
_rdheader
specifying the encoding.iso-8859-1
ascii
_rdheader
accepting a pre-decoded string, since the file is plaintext. I advocate a binary input though to avoid inconsistent coding behavior.rdheader
previously ignored file read errors for local files. I would like input on whether that was intentional, because a file read error could result in corrupted file reads.rdheader
previously allowed passing a directory even ifpn_dir
was supplied. In this case, thefile_name
directory was cut from the record name and silently ignored. I don't think this is the right call, so I cut this functionality. Let me know if there's a reason to keep it.buffering
arguments are, as of now, removed. It should be easy to re-introduce them into the new enclosing functions, if needed.no_file
parameter could be deprecated, instead using aNone
test against sig_data. This removes a potential failure case because it's unlikely someone passingsig_data
doesn't intend it to be used.