NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
316 stars 249 forks source link

A neutral grouping object. #138

Closed samuelgarcia closed 10 years ago

samuelgarcia commented 10 years ago

I detached the end of this issue #132. I think it is also link to issue #96 and #137

There is a global needs for grouping things together. Not only Unit or RC.

use case are quite specific:

Andrew wants : Parameters and ParametersSet for navigation throuth SpikeTrain with same parameters at simulation time Robert and Todd want : Condition for grouping Segment (=trial) Samuel wants : RCG for grouping RC for spike sorting Other users wants : RCG for grouping Units that represent a population. Roberts wants : gourping some Epochs.

Could we have some kind of very neutral Group that could link neo objects of the same type.

g1 = neo.Group(type = 'RecordingChannel', name = 'tetrode1', items = [rc1, rc2, rc3, rc4]) g2 = neo.Group(type = 'RecordingChannel', name = 'tetrode2', items = [rc3, rc7, rc8, rc9])

or g1 = neo.Group(type = 'Segment', name = 'trial with condition A', items = [seg1, seg2, seg3]) g1.annotate(intensity = 'heavy')

or g1 = neo.Group(type = 'SpikeTrain', name = 'parameters set', items = [st1, st2, st3]) g.annotate(param1 = 8, param2 = 12)

And group would attached to Block: Block.groups

It could solve in one simple way many issues ?

toddrjen commented 10 years ago

My concern is that making it too flexible well hamper date exchange. People will end up with an object that they really don't know what to do with.

I think one of the big benefits of Neo is that, when you get a file with a Neo object, you pretty much know what each object represents.

If there are common use-cases that Neo doesn't support, I think it would be better to include explicit support for them rather than just letting people put whatever they want in an arbitrary class.

I would personally prefer a more complicated but constrained and predictable structure rather than a simpler but unconstrained and unpredictable structure.

samuelgarcia commented 10 years ago

Good to hear. Thank you. It was a proposal.

It is no so unpredictable because taht group would have a type. neo.Group(type = 'RecordingChannel') is naturally a group of channel.

neo.Group(type = 'Unit') is naturally a population of neuron.

neo.Group(type = 'Segment') is a group of trial.

toddrjen commented 10 years ago

What happens if neo.Group(type = 'Group')? Someone could conceivably build an entire new structure from scratch without using any existing Neo classes.

What does neo.Group(type = 'SpikeTrain') represent?

What prevents someone from using neo.Group(type = 'Segment') for something more similar to Blocks?

How do we know if neo.Group(type = 'Unit') is a population of interconnected neurons, neurons from the same recording site, neurons with similar electrical features, or neurons that have been visually identified as belonging to a particular class of neurons?

What if someone wants to indicate a RecordingChannel and a set of Units are related? Do they put them in the same Group, put them in two groups nested inside another Group, have some sort of annotation relating them?

samuelgarcia commented 10 years ago

We can forbid some object to be Group if it lead to complication (Group(tpe = 'Group'))

In my mind a group is inside a Block. Maybe neo.Group is not the good syntax but more something like block.create_group(type = 'Segment', items = [ ]).

neo.Group(type = 'Unit') all your list are possible use of that group. It is good to have several use case no ?

The last point is hard to argument. yes today RCG group both Units and RC and I need this for spike sorting.

It was just a proposal. I would like to hear from others. I am not sur myself to want that. The main idea is : at the end all users make list of object for many reasons. If this done in neo we take advantage of the bi directional relationship manage base neo. (object is aware of its parents)

rproepp commented 10 years ago

I share Todd's concerns on this. Especially from the perspective of tools like Spyke Viewer (or OpenElectrophy), such unspecific objects can be problematic. Also, as proposed the Group object is not capable of replacing RCG because it groups both RCs and Units.

It is attractive to have such a general purpose object because it could encompass many use cases, including some we have not foreseen, but I believe it would restrict the usefulness of Neo as a standard.

toddrjen commented 10 years ago

I have been thinking, and there might be an alternative that might solve these issues.

The idea is to have a generic "collections" object. This object can store one or more other object types, like in this proposal (except not other collection objects). The catch, however, is that it cannot be a primary container of objects. It must be a child of another object, and any objects it holds will automatically be children of its parent. It can also only hold objects its parent can hold.

So, to give an example, image a re-implementation of RCGs using a collection. Under this scenario, recordingchannels and units will be children of a Block. However, you can optionally also add an RCG collection to a block. This collection can only hold recordingchannels and units (it inherits this from Block), and if you add a recordingchannel or unit it will automatically be added to the parent Block as well. This would make an RCG a way to group objects stored in a block, rather than a storage object itself.

We could ship with two default collections, RecordingChannelGroup (or RecordingChannelCollection) and ConditionCollection. We could add more later down the road if they prove to have common-enough usage.

This would guarantee that there is a consistent structure, especially for automated tools, but still allow some flexibility for additional secondary groupings. I think this would also reduce the complexity, since users will always know where the proper place for a given object is.

There would probably also be some convenience methods in BaseNeo to access children collections, primarily for automated tools to be able to work with collections without having to know what collections are actually present.

samuelgarcia commented 10 years ago

It is not so far from what I proposed, the difference I see is that you can 'mix' object in a collection. Is it that ?

Collection sounds also good to men and it will certainly sound good for python dev. I do not known if for biologist 'collection' is more intuitive that 'group'.

Andrew or Andrey , could you comment this ticket ?

toddrjen commented 10 years ago

The main difference, I would say, is that collections cannot be parents of an object. The parent of an object must be a proper container object (Block, Segment, Unit, or RecordingChannel). Collections are a way to organize the contents of a container, but they are not containers themselves. Trying to add an object to a collection will always add it to the collection's parent as well (if it isn't already there).

It is fully allowable, within the neo standard, to have a Unit with no parent. No io class would ever produce such a thing, but it not impossible for a user to create one. A collection, on the other hand, must have a parent. It will not work without one. The parent class determines its behavior, so the very idea of a collection without a parent is meaningless, and trying to use one will result in an exception.

The reason I chose "collection" is because, at least to me, it more strongly implies that this is not a proper container itself, but rather a way of organizing objects stored somewhere else. But of course the terminology is open to discussion.

samuelgarcia commented 10 years ago

OK. In my mind it was like you are describing it. Collection at the end is more or less a secondary grouping. I think we are agree. Le 's wait for others. I am not 100% sure if leaving RCG is good or not. In my mind a object is aware that it belong to a collection, so it is a many to many relationship. collection<>object. Is that clear ?

toddrjen commented 10 years ago

Yes, in fact I would say it should be the only place where many-to-many relationships should be allowed. There should be a clear, ordinary tree structure for all container and data objects.

As for having defined rcg and condition subclasses, I think there are three primary reasons for this.

First, it provides some degree of backwards compatibility. It won't be an exact, 1:1 conversion, but it should at least provide all the same capabilities in a clear manner.

Second, in the interest of making data exchange easier, I think that is important that neo includes the most common use-cases out-of-the-box. If something is used by a lot of IO classes, leaving each to implement it in their own way defeats the purpose of having neo to begin with.

Third, it would provide clear examples. This is imperfect both to show people how to implement their own collections, as well as facilitating unit testing on our end. We would have clear examples with well-defined behavior that will be much easier to design tests for than a generic, unconstrained object.

samuelgarcia commented 10 years ago

Todd president! Todd president!

apdavison commented 10 years ago

To help us think through the ramifications, I think it would be good to have some documented use cases. Todd: could you write something in the Sphinx docs giving an example or two of how this would be used. How would the old use case examples change (if at all) if adopting this approach?

Samuel could then try adding a description of how he would use Collections for one of his use cases, and I will ask Ján (@antolikjan) to see how this would work for a neural simulation usecase.

rproepp commented 10 years ago

This could be a good alternative. My main concern would be how using these collections as RCGs would work with the plan of having multichannel analog signals as standard (#124). The semantics needed for the RCG seem to be more complex than what a generic collection object should handle. It would also mean that AnalogSignals would be stored directly in the block, I'm not sure if that is desirable.

In general however, I'm also very much in favor of providing at least one defined subclass if we decide to go this way, even if it turns out not to be useful for RCG.

toddrjen commented 10 years ago

@rproepp What is your objection to allowing AnalogSignals in a Block?

The current support for RCGs requires a lot of special code paths that greatly increasing the complexity of the code-base It is hard to code general-purpose functions that work with both RCGs and everything else, leading to many functions and IOs that need hard-coded support for just the RCG special case. Further, it breaks the consistent tree structure that Neo objects currently follow. So although I can see conceptual issues with putting an AnalogSignal in a block, I think that there are serious practical issues that may override this.

rproepp commented 10 years ago

My main concern is that there would not be a fixed place for AnalogSignals anymore, as they could appear either in a Block or in a Channel (and would presumably in both cases also be linked to a Segment). It also does not make intuitive sense to me why signals should directly belong to a Block.

In #144, you mentioned as possible use case organizing Events. With the current proposal as I understand it, this would not cover what I suggested in #137: The collection could contain Events from one Segment, or the Events would also need to be attached to the Block.

What about removing the requirement of having the grouped objects attached to the parent of the collection?

toddrjen commented 10 years ago

As I explained previously, I think that removing that requirement would completely break the Neo data model. Objects of any type could conceivably appear anywhere. This would make data sharing extremely difficult, and would make it pretty much impossible to build automated tools that handle Neo objects. This requirement guarantees that Neo objects keep a consistent structure. In my personal opinion that is a hard requirement of Neo.

As it stands now, there is not going to be a fixed place for AnalogSignals whether this is implemented or not. When AnalogSignal and AnalogSignalArray are merged, there will still be the need for putting them in two places. The question is where those places will be. I can see the intuitive issues with place AnalogSignals in Block, but I think the practical issues with dealing with RecordingChannelGroups as they currently are is much more serious. I have seen it a lot in my refactoring. There is no simple, consistent way to support a structure with many-to-many relationships. This approach is specifically designed to avoid these issues.

As for organizing Events, it is still open to discussion whether we should support Events in RecordingChannel. It very well may be the best approach. My suggested approach could provide a workaround, by using an index attribute attached to each collection to indicate which RecordingChannel. I understand this is not an optimal solution. If it is really an important use-case, then I think there should be proper support in RecordingChannel.

rproepp commented 10 years ago

Just to be clear: I do not propose to drop the restriction that collections cannot be a primary container, so all objects would still have a well defined place. As author of an automatic tool, I don't see how not forcing the contained objects to be part of the parent makes using collections substantially more difficult?

When we merge AnalogSignal and AnalogSignalArray and remove the current RecordingChannelGroup, we could just leave AnalogSignals attached only to their segment if do not have a channel. I agree that many-to-many relationships are more complicated, but we get them no matter how exactly the collections are implemented.

Concerning Events and RecordingChannels: Unfortunately, this would not solve the use case that was the reason for my suggestion: artifacts are specific to tetrodes, i.e. RecordingChannelGroups, and not single RecordingChannels (though that could of course occur as well).

toddrjen commented 10 years ago

I do not propose to drop the restriction that collections cannot be a primary container, so all objects would still have a well defined place

So under your proposal, if someone adds an object that is not already a child of a Neo object to a collection, what happens?

When we merge AnalogSignal and AnalogSignalArray and remove the current RecordingChannelGroup, we could just leave AnalogSignals attached only to their segment if do not have a channel.

I don't see how that would improve things relative to the current situation. We would have essentially the same problem we have now with RecordingChannelGroup, only it would apply to all Neo objects. We would no longer even be able to write special cases for RecordingChannelGroup. I think that would make things much more difficult than they currently are.

I agree that many-to-many relationships are more complicated, but we get them no matter how exactly the collections are implemented.

The big issue is that under the current many-to-many system, a given object can have an arbitrary number of parents of a given type. Under my proposal, this cannot happen. Every object will always have a single parent of a given type. At least from what I have seen in my refactoring, this should eliminate most, if not all, of the problems the current many-to-many system creates.

Unfortunately, this would not solve the use case that was the reason for my suggestion: artifacts are specific to tetrodes, i.e. RecordingChannelGroups, and not single RecordingChannels (though that could of course occur as well).

We could also support events attached to a block (which would actually be nice for BrainwareSrcIO, since the brainware src format does not attach events to segments, it attaches them to blocks).

rproepp commented 10 years ago

So under your proposal, if someone adds an object that is not already a child of a Neo object to a collection, what happens?

Personally, I tend to the consenting adults view and mostly wouldn't try to explicitly react to illegal actions. But we could also raise an Exceptions in such cases.

When we merge AnalogSignal and AnalogSignalArray and remove the
current RecordingChannelGroup, we could just leave AnalogSignals
attached only to their segment if do not have a channel.

I don't see how that would improve things relative to the current situation. We would have essentially the same problem we have now with RecordingChannelGroup, only it would apply to all Neo objects. We would no longer even be able to write special cases for RecordingChannelGroup. I think that would make things much more difficult than they currently are.

Maybe we aren't talking about the exact same thing, because I don't understand this objection...

I agree that many-to-many relationships are more complicated, but
we get them no matter how exactly the collections are implemented.

The big issue is that under the current many-to-many system, a given object can have an arbitrary number of parents of a given type. Under my proposal, this cannot happen. Every object will always have a single parent of a given type. At least from what I have seen in my refactoring, this should eliminate most, if not all, of the problems the current many-to-many system creates.

But that would not change. In both cases, each object has only a single (or potentially no) parent of a given type among the Neo container objects. Unless I've misunderstood your proposal, in both cases each object can also be contained in an arbitrary number of collections?

Unfortunately, this would not solve the use case that was the
reason for my suggestion: artifacts are specific to tetrodes, i.e.
RecordingChannelGroups, and not single RecordingChannels (though
that could of course occur as well).

We could also support events attached to a block (which would actually be nice for BrainwareSrcIO, since the brainware src format does not attach events to segments, it attaches them to blocks).

That would be possible. Events would not have a fixed place in the hierarchy anymore though, they could belong to a Segment, a Block, or both. I think we could also find legitimate use cases for grouping SpikeTrains across Segments and Units. We would then need all Neo objects to be optionally directly attached to the Block. While I'm not sure if this would really be a bad thing, at first consideration it seems wrong to me.

I'm not familiar with the Brainware format. What is the meaning of an event without a Segment? In my mind, data with a time coordinate belongs to a respective Segment. But Block also has a rec_datetime property so I can see that this can be interpreted differently. We should make sure that this is not ambiguous to us or other users!

toddrjen commented 10 years ago

But that would not change. In both cases, each object has only a single (or potentially no) parent of a given type among the Neo container objects. Unless I've misunderstood your proposal, in both cases each object can also be contained in an arbitrary number of collections?

The important point of my proposal is that any object added to a collection would automatically have its parent set to the parent of that collection. Say you have a collection that is attached to a Unit, and you add a SpikeTrain to that collection. The SpikeTrain would have its Unit set to be the Unit the collection as attached to. That guarantees that the tree structure will always be preserved.

Most of the problems I see stem from the lack of a clear tree structure. You can't safely navigate up down between parents and children. Under my proposal, you will always be able to do so, you can simply bypass the many-to-many connections entirely when they aren't useful. Further, the many-to-many connections are isolated to objects that are already connected by a parent/child relationship, which makes them much, much easier to deal with.

If you remove this coupling, then you lose that connection. The tree structure is gone, and you now have to worry about navigating the many-to-many connections in order to get a useful picture of the structure. But now those many-to-many connections are everywhere. You also have to be much more careful to avoid infinite recursion loops.

Further, you can't predict where you will find an object of a particular class. You have to be prepared for there to be objects of any type as sub-children of a given object, and you have to be prepared for objects of any type to be super-parents of a given type.

That would be possible. Events would not have a fixed place in the hierarchy anymore though, they could belong to a Segment, a Block, or both.

Right, which would make them much more similar to how Spike, SpikeTrain, AnalogSignal, AnalogSignalArray, and IrregularlySampledSignal objects are now (which can be part of a segment and/or another container or grouping object).

What is the meaning of an event without a Segment?

Brainware allows comments (which are time-stamped strings, basically an event and a label) to be added at any point, even when there is no recording in progress. That means the time a comment occurs may or may not be during a segment. I currently handle this by making segments[0] always have just comments, and having data segments proceed from there.

rproepp commented 10 years ago
But that would not change. In both cases, each object has only a
single (or potentially no) parent of a given type among the Neo
container objects. Unless I've misunderstood your proposal, in
both cases each object can also be contained in an arbitrary
number of collections?

The important point of my proposal is that any object added to a collection would automatically have its parent set to the parent of that collection. Say you have a collection that is attached to a Unit, and you add a SpikeTrain to that collection. The SpikeTrain would have its Unit set to be the Unit the collection as attached to. That guarantees that the tree structure will always be preserved.

Most of the problems I see stem from the lack of a clear tree structure. You can't safely navigate up down between parents and children. Under my proposal, you will always be able to do so, you can simply bypass the many-to-many connections entirely when they aren't useful.

Up to here I'd say that would be the same. Especially the fact that you can just bypass the collections. That's how I see their potential use: Additional structural information that doesn't have to be considered for the data model to still make sense and be useful.

Further, the many-to-many connections are isolated to objects that are already connected by a parent/child relationship, which makes them much, much easier to deal with.

But that's just because we'd need additional parent/child relationships at every point where a collection could be used, making the main data model more complex.

If you remove this coupling, then you lose that connection. The tree structure is gone, and you now have to worry about navigating the many-to-many connections in order to get a useful picture of the structure. But now those many-to-many connections are everywhere. You also have to be much more careful to avoid infinite recursion loops.

Further, you can't predict where you will find an object of a particular class. You have to be prepared for there to be objects of any type as sub-children of a given object, and you have to be prepared for objects of any type to be super-parents of a given type.

I agree with these difficulties. I thought about having a restriction for collections to only include objects further "down" in the main tree hierarchy. I didn't include it originally because I believe there are valid use cases that would not work with this restriction.

But I think you are right: The tradeoff between a more complex specification and covering these use cases one the one hand and a much simpler handling on the other hand should be in favor of the simpler handling here. In that case, the difference between the proposals is only the question of whether we want the explicit parent/child relations when using collections.

That would be possible. Events would not have a fixed place in the
hierarchy anymore though, they could belong to a Segment, a Block,
or both.

Right, which would make them much more similar to how Spike, SpikeTrain, AnalogSignal, AnalogSignalArray, and IrregularlySampledSignal objects are now (which can be part of a segment and/or another container or grouping object).

Sorry, I meant it would need to belong to Segment, RecordingChannel and Block. That would be a new situation, but it would happen to other data objects (at least AnalogSignal) as well. You are correct, though I don't know in what situation a data object is not attached to a Segment in practice.

What is the meaning of an event without a Segment?

Brainware allows comments (which are time-stamped strings, basically an event and a label) to be added at any point, even when there is no recording in progress. That means the time a comment occurs may or may not be during a segment. I currently handle this by making segments[0] always have just comments, and having data segments proceed from there.

Hmm, I think I would prefer this being handled as annotations, possibly automatically creating events when their timing is appropriate for a real Segment. But that depends on the interpretation of the temporal nature of a Block. Event times are relative (currently to the start of the Segment), what if they are attached to a Block? Or to both?

toddrjen commented 10 years ago

In that case, the difference between the proposals is only the question of whether we want the explicit parent/child relations when using collections.

Is there a use-case where we would want to have something be part of a collection but not be a child of the object the collection belongs to? That is assuming we don't allow collections to combine objects from multiple containers, which I think is what you are saying.

Hmm, I think I would prefer this being handled as annotations, possibly automatically creating events when their timing is appropriate for a real Segment. But that depends on the interpretation of the temporal nature of a Block.

The problem with annotations is linking the time and the text. Lists and tuples are not acceptable annotations, so there is no clean way to do this in an annotations. In the file format, the time stamps of the comments are in absolute time (or rather relative to some arbitrary point in the mid 1800's), so what reference time is used is arbitrary.

Event times are relative (currently to the start of the Segment), what if they are attached to a Block? Or to both?

I didn't know that. I can't find any mention in the documentation of what, specifically, the times of events are meant to represent. I assumed it was up to the IO class. If it is meant to be relative to the start of the segment, then the documentation should probably reflect that if it doesn't already.

rproepp commented 10 years ago

Is there a use-case where we would want to have something be part of a collection but not be a child of the object the collection belongs to? That is assuming we don't allow collections to combine objects from multiple containers, which I think is what you are saying.

The case with artifact events specific to a RecordingChannelGroup would be an example. The collection would combine events from multiple Segments, so I would allow that.

The problem with annotations is linking the time and the text. Lists and tuples are not acceptable annotations, so there is no clean way to do this in an annotations. In the file format, the time stamps of the comments are in absolute time (or rather relative to some arbitrary point in the mid 1800's), so what reference time is used is arbitrary.

Lists are accepted, as stated in the documentation. Tuples are also accepted by the code, although the docs do not explicitly state it.

I didn't know that. I can't find any mention in the documentation of what, specifically, the times of events are meant to represent. I assumed it was up to the IO class. If it is meant to be relative to the start of the segment, then the documentation should probably reflect that if it doesn't already.

I haven't seen this in the documentation, so I'm assuming as well. But what else could it be? The times have to be relative. Why would they belong to a Segment if they are not linked to it? How would they be associated to the rest of the data otherwise?

toddrjen commented 10 years ago

The case with artifact events specific to a RecordingChannelGroup would be an example. The collection would combine events from multiple Segments, so I would allow that.

That assumes we don't allow events as children of a block.

Lists are accepted, as stated in the documentation. Tuples are also accepted by the code, although the docs do not explicitly state it.

Sorry, I remembered incorrectly. Still, using a tuple isn't as natural a way to do it when there is a dedicated neo data object perfectly suited to store this sort of data. But that is another issue.

I haven't seen this in the documentation, so I'm assuming as well. But what else could it be? The times have to be relative. Why would they belong to a Segment if they are not linked to it? How would they be associated to the rest of the data otherwise?

They could be in Unix epoch time or something. This should probably be explicitly stated in the documentation, but it isn't really relevant to the current discussion.

I am concerned the current discussion is getting off-track. If I may make a suggestion, let us discuss my specific proposal in the comments on pull request #144. That would make it easier to separate from previous proposals, would allow comments on specific lines from the proposal, and will allow more specific changes to be made to the proposal. Is that alright.

samuelgarcia commented 10 years ago

I have your discussion quickly.

My simple intuition is:

rproepp commented 10 years ago

Continued at #144