catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 23 forks source link

[Feature]: Add `OnePhotonSeries`/`TwoPhotonSeries` to processing #581

Closed weiglszonja closed 1 year ago

weiglszonja commented 1 year ago

What would you like to see added to NeuroConv?

Add write_as option to add_photon_series to support adding processed OnePhotonSeries data to processing.

For reference, the Tye lab has OnePhotonSeries data acquired from Miniscope and they want to store the processed and motion corrected data as well. Since we are soon over hours there, I only created a custom function at the repo for now, but eventually add_photon_series should have an option to write processed data.

Is your feature request related to a problem?

No response

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

CodyCBakerPhD commented 1 year ago

Though I have had questions raised around the choice of write_as as the keyword name for this; @h-mayorquin @weiglszonja any ideas for a better more descriptive name of this role, now that it is needed in multiple places throughout our API?

I guess the most explicit would be write_to_module but even that makes it sound more like a boolean asking if we want to write or not?

weiglszonja commented 1 year ago

@CodyCBakerPhD how about:

...
nwbfile_module_name: Literal["acquisition", "processing"] = "acquisition"
    The name of the module where the photon series data will be added to, default is `nwbfile.acquisition`
...
CodyCBakerPhD commented 1 year ago

@weiglszonja Sure, that seems fine to me

h-mayorquin commented 1 year ago

Using the fact that our appending functions follow this patttern:

 add_something(object_to_add, nwbfile, **kwargs)`

I think that nwbfile is somehow redundant in the parameter name (e.g. nwbfile_module_name could be module_name instead). I also want to indicate in the name that this is a destination of the adding action. What do you guys think of target_module or something of the like?

General question: Are acquisition and stimulus (were we also use write_as) modules? I have been working with pynwb for a while and the concepts stick don't stick with me.

weiglszonja commented 1 year ago

@h-mayorquin

General question: Are acquisition and stimulus (were we also use write_as) modules? I have been working with pynwb for a while and the concepts stick don't stick with me.

I'm not sure they are modules, in this tutorial they are referred to as "groups".

I think that nwbfile is somehow redundant. I also want to indicate in the name that this is a destination of the adding action. What do you guys think of target_module or something of the like?

I started working on this in #587, there I'm using something like nwbfile_module_name but I would agree that we should indicate the "where" better. I'm not sure I understand how you would remove nwbfile from the add_*() functions

h-mayorquin commented 1 year ago

@weiglszonja I meant that the "nwbfile" is not required as part of the parameter name (e.g. nwbfile_module_name could be module_name instead) because it is very clear that the methods are adding to an nwbfile. I will correct above.

CodyCBakerPhD commented 1 year ago

I think that nwbfile is somehow redundant in the parameter name (e.g. nwbfile_module_name could be module_name instead). I also want to indicate in the name that this is a destination of the adding action.

I also thought of that, but module_name doesn't make it clear that that is the location we intend to write the object to

What do you guys think of target_module or something of the like?

That name occurred to me as well, but I didn't suggest it because 'target' doesn't really seem like the right word

CodyCBakerPhD commented 1 year ago

To get perspective in practice

Cody's preference and Szonja proposal

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    nwbfile_module_name="acquisition",
)

Without redundancy

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    module_name="acquisition",
)

Heberto preference

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    target_module="acquisition",
)

Heberto preference more verbose

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    target_nwbfile_module="acquisition",
)

Old write_as (precedence, but less preferred)

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    write_as="acquisition",
)
CodyCBakerPhD commented 1 year ago

I can see what you object to with respect to redundancy @h-mayorquin

But personally I don't mind it. Excess verbosity at cost of some redundancy is preferable to minimize ambiguity and maximize clarity at all times

h-mayorquin commented 1 year ago

@CodyCBakerPhD No, I am also fine with verbosity. I took the liberty (feel free to revert!) to edit your comment to reflect my preferences : P

My main concern is to add target (or something of the sort) to indicate what the parameter is suppose to be doing. So more than eliminating the redundancy I want to add specififty to the action of what the paraemter is doing. I added another case to your summary to make this clear.

h-mayorquin commented 1 year ago

Maybe it would be good to get the vote of @pauladkisson and @alessandratrapani to fresh eyes to the issue.

CodyCBakerPhD commented 1 year ago

'target' to me means 'an error-prone ideal to aim for', with implication that it may not actually achieve it

More honest to what it actually does is 'attach' or 'include in' or 'contained by'

TBH the biggest part of my point on including 'nwbfile' to further specify 'module' is because the word 'module' has so much overload and application to other things besides the NWBFile modules as containers

New proposals: attach_to_nwbfile_module_name, include_in_nwbfile_module_name, contained_by_nwbfile_module_name

CodyCBakerPhD commented 1 year ago

But yes would be great to get ideas or support for one or another from @pauladkisson or @alessandratrapani

h-mayorquin commented 1 year ago

Of the three proposals I like atttach more.

alessandratrapani commented 1 year ago

Having a naive view on this, I would vote for nwbfile_module_name or attach_to_nwbfile_module_name (a bit long but clear)

weiglszonja commented 1 year ago

I feel we have a common ground with attach_to_nwbfile_module_name, I modified the name to this one in #587, but let me know if you feel otherwise or have other ideas @pauladkisson

pauladkisson commented 1 year ago

I think using 'module' to refer to only acquisition or processing is not right since modules usually refer to processing modules. This is reinforced by neuroconv.tools.nwb_helpers.get_module() which returns a processing module. From the docs I think 'container' is the right terminology (although that's also more general bc it can refer to analysis, stimulus, etc. see here).

I also think 'attach_to' is a bit awkward terminology to refer to the nested object structure, and it's already obvious that the function will be adding the 1PSeries to the nwbfile from the function name. So with that in mind I propose:

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    parent_container="acquisition",
)
CodyCBakerPhD commented 1 year ago

Paul's suggestion - 'parent container'

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    parent_container="acquisition",
)

Would this also become

def add_photon_series(
    imaging=my_imaging_extractor,
    nwbfile=my_nwbfile,
    metadata=my_metadata,
    photon_series_type="OnePhotonSeries",
    parent_container="processing/ophys",
)

for clarity?

pauladkisson commented 1 year ago

Yeah, I think that's reasonable.

weiglszonja commented 1 year ago

@pauladkisson very good points. How about flattening out "processing/ophys" to "processing", I guess it will always go to "ophys" processing module, otherwise maybe adding second attribute for processing_module_name="ophys"? but maybe that's too far.

CodyCBakerPhD commented 1 year ago

The context is that this used to be / still is (in spikeinterface tools), write_as which used to be / still is 'raw' vs. 'processed'

Reading in English, it makes sense to 'write the neurodata type as raw data' in which case it goes to acquisition, or 'write the neurodata type as processed data' in which case it goes to a processing submodule of the appropriate name

Now that we switch to attach_to_nwbfile_module_name or parent_container, both of those now refer to the actual technical group inside the NWBFile object; to which it is much more precise and true to refer to as acquisition and the true processing location processing/ophys in this situation

I would note that the values of the option would still be constrained as Literal choices as Literal["acquisition", "processing/ophys"] to be concise about the structure, and to prohibit bad mixing of processing submodules

parent_container is also back to being nice and short; what do the rest of you think in light of this new suggestion? @alessandratrapani @h-mayorquin

alessandratrapani commented 1 year ago

From the docs I think 'container' is the right terminology

Thank you for pointing that out. With this in mind, I am more in favour of parent_container.

weiglszonja commented 1 year ago

Changes made based on suggestions at #587, thank you all for the great ideas. @h-mayorquin are you also good with parent_container?

h-mayorquin commented 1 year ago

I think this is a good improvement, thanks @pauladkisson.

Two axis: 1) Precision: low ambiguity. 2) Expliticitness: from the name its obvious what it does.

I think "write_as" was doing a poor job on the two axis. It became obvious to me because I saw it again and again but I remember when I encountered it from the first time and it was confusing.

I think that "parent_container" is very precise which is great. I also think is clear to us because we are familiar with the implementation details of pynwb so we know what we are talking about.That said, I don't think it is very explicit for someone without that knowledge.

But at this point I am very unsure that we can do better. It is a complicated concept to compress ("the structure in the nwb file where your data structure will end up as"). I think that @pauladkisson hit the jackpot in terms of utility / effort ratio by finding something very precise. I also think that the effort that would be required from us to find an improvement in both axis as good as this is probably not worth the effort.

I am happy with where the things are. Let the argument docstring do the rest.