catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 21 forks source link

Preserve order of recording extractor custom properties in `add_electrodes()` #792

Open weiglszonja opened 6 months ago

weiglszonja commented 6 months ago

I noticed the order of properties that are being added to the Electrodes table is not in the same order as in the recording_extractor._properties, for instance:

recording_extractor._properties

[...., 'electrode_type', 'chamber_type', 'chamber_x', 'chamber_y', 'chamber_z', 'acx_x', 'acx_y', 'acx_z']

The order of properties to add to the Electrodes table is changed at this line: https://github.com/catalystneuro/neuroconv/blob/fa6edd532f6bb6ead42ae40aa9029c7c78eead5f/src/neuroconv/tools/spikeinterface/spikeinterface.py#L322 which does not preserve the order of the elements:

['electrode_type', 'acx_x', 'chamber_type', 'chamber_z', 'channel_ids', 'acx_z', 'chamber_x', 'acx_y', 'chamber_y', 'offset_to_uV', 'gain_to_uV']

I think we should preserve the order of properties to be the same the recording_extractor._properties.

Opened https://github.com/catalystneuro/neuroconv/pull/793 as draft to discuss

h-mayorquin commented 6 months ago

Why should we keep the order?

CodyCBakerPhD commented 6 months ago

@h-mayorquin Because it doesn't look as good as it could in Neurosift

In Szonja's example of : ['electrode_type', 'acx_x', 'chamber_type', 'chamber_z', 'channel_ids', 'acx_z', 'chamber_x', 'acx_y', 'chamber_y', 'offset_to_uV', 'gain_to_uV'], wouldn't it look much better if the acx_ were kept together?

(BTW @weiglszonja I'd expand the acx to something more readable - I'm assuming this is acceleration?)

h-mayorquin commented 6 months ago

If that's the case maybe an easier solution is to have a predetermined order for the special ones (like location) and then write the rest of them with natsort.

I Just thinking that bounding ourselves to followg spikeinterface order will bring us problems. Plus, it is even more problematic when you consider more than one recording object.

CodyCBakerPhD commented 6 months ago

@h-mayorquin See discussion on related PR for proposed approach: https://github.com/catalystneuro/neuroconv/pull/793

weiglszonja commented 6 months ago

(BTW @weiglszonja I'd expand the acx to something more readable - I'm assuming this is acceleration?)

I'm following the lab's notation here, acx_ are the coordinates relative to the crossing of anterior commissure (I added a description so it's clear when looking at the table).

Plus, it is even more problematic when you consider more than one recording object.

Yes, I'm not sure how to solve that one in https://github.com/catalystneuro/neuroconv/pull/793 yet...

CodyCBakerPhD commented 6 months ago

Yes, I'm not sure how to solve that one in https://github.com/catalystneuro/neuroconv/pull/793 yet...

In principle the approach outlined in https://github.com/catalystneuro/neuroconv/pull/793#issuecomment-2025843301 should handle it because the metadata columns are already global across interfaces

h-mayorquin commented 6 months ago

My first intuition is that the complexity is not worth the effort. For producing better looking output I think a hard-coded rule for some properties and then sorting the rest should ease most of the ugliness.

I am strongly against:

CodyCBakerPhD commented 6 months ago

Did you read the PR conversation?

Using the already existing metadata["Ecephys"]["Electrodes"] list isn't adding any complexity - all those columns will already be written - and it's already a List[dict] so why not just rely on that order as a loose mechanism for granting priority when constructing the table? It's also already global. It also abstracts from the concept of what was automatically added vs. added in a custom fashion

This is also a feature that could play a role in the GUIDE, which allows even easier customization of electrode table content than we've ever had

h-mayorquin commented 6 months ago

Yes, I did, I had this proposal in mind:

Using the already existing metadata["Ecephys"]["Electrodes"] list isn't adding any complexity - all those columns will already be written - and it's already a List[dict] so why not just rely on that order as a loose mechanism for granting priority when constructing the table? It's also already global. It also abstracts from the concept of what was automatically added vs. added in a custom fashion

The complexity added would be the modifcation to make the rest of the function to actually work with that. Then documenting and testing the different scenarios (multiple recordings, recordings that match only in some channels, etc) and the error handling. You already mentioned that all the properties should be there.

Probably our difference here is that I Just don't see much value on ordering columns. The example you gave seems like a minor aesthethical thing that can be fixed with two lines (one for defining the order of the special properties) and one for sorting the rest. What would further customization be used for?

https://c2.com/xp/YouArentGonnaNeedIt.html

CodyCBakerPhD commented 6 months ago

natsort is not magical, and does not grant latent precedence to items that might be considered 'more important' or more 'semantically related'

This can be documented with a single line statement: "The order of entries in the ["Ecephys"]["Electrodes"] list provides a precedence for column order in the final electrodes table."

More tests is a good thing, might even catch other issues with tangentially related things

https://c2.com/xp/YouArentGonnaNeedIt.html

Which is why this feature wasn't in the code base before now, but is requested now, so we're adding it now

h-mayorquin commented 6 months ago

natsort is not magical, and does not grant latent precedence to items that might be considered 'more important' or more 'semantically related'

I think it would be more fruitful if we had some concrete use cases on mind. The example that you mentioned above would be work with natsort.

I am coming from the prior that determining the order of columns is something that does not add too much. But if the code, tests and documentations are that simple I guess the implementation PR should showcase it.

How would you grant configurability to other tables? Units for example?

Just to be clear, is @weiglszonja use case the same as yours? To make the electrode_table look better?