NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
327 stars 249 forks source link

Annotation handling when grouping signals #740

Open JuliaSprenger opened 5 years ago

JuliaSprenger commented 5 years ago

This is a follow up of #625.

When loading information using the rawio framework, annotations are handled differently depending on the grouping modes of the signals. When splitting signal into individual Neo objects, all channel-wise annotations are loaded and attached to the objects. However, when grouping the signals according to common aspects, some channel-wise annotations might be lost if they can not be converted into an array_annotation.

Here is an example using the NixIO Raw implementation for loading

# writing block with two AnalogSignal traces and individual annotations
with NixIO(filename,'ow') as io:
    bl0 = Block(my_custom_annotation='hello block')
    bl0.segments.append(neo.Segment())
    bl0.segments[0].analogsignals.append(neo.AnalogSignal([1,2,3]*pq.V, sampling_rate=1*pq.Hz, anno1 = 'myfirstannotation'))
    bl0.segments[0].analogsignals.append(neo.AnalogSignal([1,2,3]*pq.V, sampling_rate=1*pq.Hz, anno2 = 'mysecondannotation'))
    io.write_block(bl0)

# test for signal_group_mode='split-all'
with NixIOFr(filename) as io:
    bl1 = io.read_block(signal_group_mode='split-all')
    assert len(bl0.annotations) == len(bl1.annotations)
    assert len(bl1.segments[0].analogsignals) == len(bl0.segments[0].analogsignals)
    for aid in range(len(bl0.segments[0].analogsignals)):
        for k,v in bl0.segments[0].analogsignals[aid].annotations.items():
            assert bl1.segments[0].analogsignals[aid].annotations[k] == v

# test for signal_group_mode='group-by-same-units'
with NixIOFr(filename) as io:
    bl1 = io.read_block(signal_group_mode='group-by-same-units')
    assert len(bl0.annotations) == len(bl1.annotations)
    assert len(bl1.segments[0].analogsignals) == 1
    for anasig in bl0.segments[0].analogsignals:
        for k,v in anasig.annotations.items():
            # these assertion fail due to grouping of AnalogSignals leading to missing annotation information
            assert k in bl1.segments[0].analogsignals[0].array_annotations
            assert bl1.segments[0].analogsignals[0].annotations[k] == v

Here, since both objects have different annotations keys, the annotation is not handled as an array_annotation, but ignored during loading. Maybe annotation of this type should be expanded to be represented as array_annotations to not be lost during the loading of the data? @samuelgarcia: What do you think?

samuelgarcia commented 5 years ago

Hi Julia, this example is a bit tricky because because we create 2 analosignal splitted we save then and we try to load then grouped (here poissble because same size/sr/units but a special case). I don't known how it is possible to handle annotations when assymetric (not all channel). Maybe we could add a option load_assymtric_annoation_array_when_grouping_anyway=True/False by adding some None in the array. But it is for a very special case. At least we could have a wrning when annoation are not all loaded because of an internal assymety.

What do you think ?

JuliaSprenger commented 5 years ago

Hi Samuel, I agree this case is probably special when looking at output data of a recording system, but think this case is much more likely to occur in the context of data analysis and corresponding results being saved (e.g. using Nix). For example, the user might want to have an original and a processed version of the recorded data in two different AnalogSignal objects, whereas the processed version has the same basic properties as the original, permitting a grouping into a single AnalogSignal, but containing additional annotations e.g. to describe the processing performed.

There are different options on how to handle such asymmetric channels

samuelgarcia commented 5 years ago

For solution 2, how would be the annoations dict ? Would we have the channel index as key for this sub dict ?

For instance:

anasig.annotations['sparse_channel_annotations'] = {
         0 : dict(anno1 = 'myfirstannotation'), 
         1 : dict(anno2 = 'mysecondannotation'), 
}
JuliaSprenger commented 5 years ago

Yes, that's what I had in mind. But as said, this solution looses all array_annotation advantages, so if we implement this, we should add an explicit, huge warning with it.

samuelgarcia commented 5 years ago

This second solution is easier to implement and easier to anderstand for a user. So I vote for this with a a warning.

Do you want I do this ? I you want to have a look you start with create_analogsignal_array_annotations in io/proxyobject.py.

This function should return 2 dict:

What do you think ?

JuliaSprenger commented 5 years ago

Ok, if @apdavison does not object, then we go for the sparse channel annotations. Feel free to code ahead ;)

samuelgarcia commented 5 years ago

@JuliaSprenger Hi julia. A last question before trying to implement this.

In anasig.annotations['sparse_channel_annotations'] what is the key. channel index ? channel id ? If channel index it is a bit more complicated, because it will created with the global channel index but each time we want a slice along channel axis the key will be wrong. And we will need to implement at object some reammping key only for this sparse_channel_annotations sub dict.

What do yout think ?

samuelgarcia commented 4 years ago

Hi @JuliaSprenger I really don't see how we will implement this. The bad is the key : is it the channel index ? what append when we slice along channel axis ? The key then will be bad. I think this sparse_channel_annotations is not a so good idea implemented with keys. Maybe with sparse arrays it would be better to be robust when slicing.