MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
738 stars 300 forks source link

rdsamp and wrsamp not isopmorphic #484

Open greenstick opened 5 months ago

greenstick commented 5 months ago

Hi there, this has likely been discussed somewhere at length however I can't seem to find such a discussion and the documentation doesn't appear to cover this key limitation:

In short, if I call wfdb.rdsamp and then use the corresponding wrsamp method, typical programming conventions assume that the two methods should be isomorphic – whatever is read into memory by rdsamp should be sufficient and, by default, produce an identical output when wrsamp is called to process the in-memory object. However, as best as I can tell, this is not the case, which is very counterintuitive and breaks with standard programming conventions.

This key limitation doesn't appear to be covered in the documentation and, at minimum, it would seem it should be. Better yet, an example of what arguments to set to make them isomorphic should be documented. This assumes that the ideal (that they are isomorphic) is not possible, however, for the library UX, this seems something to strive.

I would send a PR myself, however, I'm actively running up against this issue and therefore do not currently have a solution to propose.

greenstick commented 3 months ago

Wanted to post a late update to this – ultimately it turned out I was unable to write data using the wfdb.wrsamp method. I'm not too familiar with the wfdb standard and couldn't tell from reading the I/O code whether the method has been fully implemented. If not, it may be good to raise a NotImplementedError exception in the interim. If it has been, there may be a downstream bug in wfdb and I can close this one, run some tests, and open up an issue there.

briangow commented 2 months ago

@greenstick , I apologize for the delay in responding. Thank you for you suggestion! I have opened a separate issue to indicate that we want to make our read and write function(s) isomorphic: https://github.com/MIT-LCP/wfdb-python/issues/500 .

Regarding being unable to write data using wrsamp, could you provide more details regarding this? The wrsamp function could be improved but as far as I know is working for writing files as documented ( https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.wrsamp ) .

greenstick commented 1 month ago

Hi @briangow, sure thing. Here's an example of code that generates an error:

data = wfdb.rdsamp(input_file)
signal, fields = data
wfdb.wrsamp(output_file_path, p_signal = signal, **fields)

This is an example of the code I couldn't get to work where I last left off, so it may be incorrect. However, I did follow the documentation and found I was unable to have that work using similar logic (i.e. read > assign > write) in a way that from my reading of the docs should be isomorphic.

In the code above, the error generated by wfdb is:

TypeError: wrsamp() got an unexpected keyword argument 'sig_len'

Which seems surprising given that the fields value of the data tuple was read in with rdsamp (as far as I'm aware, I can't simply pass data back into wrsamp and get an idempotent output, though I last tested this months ago and can't recall specifics).

If it's working as expected, it may make sense to allow wrsamp to be overloaded with all kwargs of the fields dictionary, so args like sig_len can be passed, even if they are ignored.

briangow commented 1 month ago

@greenstick , thanks! Yeah, rdsamp and wrsamp are definitely not isomorphic at this point. For example rdsamp does not have a variable which indicates what format the stored record you read was stored in but wrsamp requires this parameter to be set when writing. As mentioned in #500 , we may create an isomorphic write based on rdrecord instead of rdsamp which only makes use of a subset of the available parameters.

If you are interested, here is an example where we manually set some of the parameters in wrsamp after reading from rdsamp:

signals, fields = wfdb.rdsamp('a103l', pn_dir='challenge-2015/training')
for key in fields.keys():
    print(key)

# Choose the written file format from https://www.physionet.org/physiotools/wag/signal-5.htm
fmt_list = fields['n_sig'] * ['16'] # Here we use format 16
wfdb.wrsamp('written_record', fs = fields['fs'], units=fields['units'], sig_name=fields['sig_name'],
            base_date=fields['base_date'], base_time=fields['base_time'], comments=fields['comments'], p_signal=signals,
            fmt=fmt_list)