MIT-LCP / wfdb-python

Native Python WFDB package
MIT License
746 stars 302 forks source link

How to avoid list of 'None' in the aux_note of wfdb.io.wrann #338

Closed satyaog closed 2 years ago

satyaog commented 2 years ago

Hi,

I'm currently working on converting an EGC dataset icentia11k to the wfdb format so it can be hosted in the PhysioBank database.

I've received one feedback on the aux_note of an annotation file for which I'm not sure how to resolve.

They've told me that aux_note does not need to include values like 'None' in it but when I try to reduce the size of the aux_note numpy array that I'm sending to wfdb.io.wrann or replace the string by actual None, I'm blocked by two types of errors.

Either I get

~/CODE/icentia11k_wfdb/venv/3.7/lib/python3.7/site-packages/wfdb/io/annotation.py in check_field_cohesion(self, present_label_fields)
    510 
    511         for field in ['sample', 'num', 'subtype', 'chan', 'aux_note']+present_label_fields:
--> 512             if getattr(self, field) is not None:
    513                 if len(getattr(self, field)) != nannots:
    514                     raise ValueError("The lengths of the 'sample' and '"+field+"' fields do not match")

ValueError: The lengths of the 'sample' and 'aux_note' fields do not match

or

~/CODE/icentia11k_wfdb/venv/3.7/lib/python3.7/site-packages/wfdb/io/annotation.py in check_field(self, field)
    443             for e in uniq_elements:
    444                 if not isinstance(e, str_types):
--> 445                     raise TypeError('Subelements of the '+field+' field must be strings')
    446 
    447             if field == 'symbol':

TypeError: Subelements of the aux_note field must be strings

I'm using wfdb-python version 3.4.1.

From the documentation of wfdb.io.wrann, it seams indeed that aux_note needs to be the same size of sample. So reducing the size of aux_note to contain only non-empty values doesn't seem feasible. But it seems to indicate that the None value should be acceptable for annotations that doesn't have a note.

I have a gist with the code I'm using to do the conversion: https://gist.github.com/satyaog/665dee88cec0c2d0f2ec78f7cb0919af Test data can also be found in this Google Drive: https://drive.google.com/drive/folders/1bW-k7zBF6ZRrd7ZU06YtLmWSXAzG5ZFJ

To test with a filtered aux_note, the following snippet can be used to replace the lines in the gist:

        indices = symbol != ''
        sample = sample[indices]
        symbol = symbol[indices]
        aux_note = aux_note[aux_note != '']
        chan = np.array([0] * len(sample))

To test with an aux_note using None in the array, the following snippet can be used to replace the lines in the gist:

# https://wfdb.readthedocs.io/en/latest/io.html#wfdb.io.show_ann_labels
label_mapping = {"btype": {0: ('Q', None),     # Undefined: Unclassifiable beat
                           1: ('N', None),     # Normal: Normal beat
                           2: ('S', None),     # ESSV (PAC): Premature or ectopic supraventricular beat
                           3: ('a', None),     # Aberrated: Aberrated atrial premature beat
                           4: ('V', None)},    # ESV (PVC): Premature ventricular contraction
                 "rtype": {0: ('Q', None),     # Null/Undefined: Unclassifiable beat
                           1: ('Q', None),     # End: Unclassifiable beat
                           2: ('Q', None),     # Noise: Unclassifiable beat
                           3: ('N', None),     # NSR (normal sinusal rhythm): Normal beat
                           4: ('+', "(AFIB"),  # AFib: Atrial fibrillation
                           5: ('+', "(AFL")}}  # AFlutter: Atrial flutter

Without any change to the gist, the following snippet prints the content of the aux_note:

ann = wfdb.rdann("p00010/p00010_s00", "atr", sampto=2048)
print(ann.aux_note)

# ['None', '(AFL', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None', 'None']

Note that the review process is still ongoing and I'm starting to believe that in "rtype" 3: ('N', '') should probably be replaced by 3: ('', '') or maybe 3: ('+', '('). Although this is not related to the issue, an explanation might ease the itchiness some could feel in their eyes by looking at the current version of the code.

bemoody commented 2 years ago

Personally, I also find this API to be confusing. Yes, the length of aux_note must be the same as the length of sample (and symbol, chan, etc.) The nth element of aux_note is the text note associated with the nth element of sample. So it doesn't make sense to pass lists of different lengths.

If the nth event doesn't require a text note, the corresponding element of aux_note should be set to an empty string ('').

Arguably, wrann should be made to accept None as equivalent to an empty string, but that's not currently allowed. (Also, I think wrann should accept any type of iterable, not only lists or numpy arrays.)

If you have a list (or a numpy array) like:

aux_note = [None, '(AFL', None, None, None]

you can replace the None values with empty strings using:

aux_note = [(i or '') for i in aux_note]

Does that help?

satyaog commented 2 years ago

Yes thank you this makes things more clear. I was already using an empty string ('') in my gist but I was wondering if there was a better way. I was a bit surprise that the aux_note annotations from wfdb.rdann seemed to be ['None', '(AFL', 'None', 'None', ...] instead of ['', '(AFL', '', '', ...] but I'm happy to hear that my usage of the library is ok.