AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
343 stars 149 forks source link

Add number of valid ROIs to ophys_experiments_table (cache/manifest table) #2544

Closed DowntonCrabby closed 2 years ago

DowntonCrabby commented 2 years ago

Describe the use case that is addressed by this feature. When external users are selecting ophys experiments of interest to analyze it would be really useful for them to know up front how many valid cells/ROIs are associated with each experiment.

Describe the solution you'd like Add number of valid cells/ROIs for each experiment to the ophys_experiments_table (cache/manifest table)

Additional context This will likely need to be calculated before it can be added as I don't think this exists anywhere currently.

Do you want to work on this issue? I'm happy to create a function that will quickly calculate the number of valid ROIS

danielsf commented 2 years ago

Related to #2543

morriscb commented 2 years ago

Hey everyone, so summarize Pika's position on this ticket: we feel that the ophys_cells_table provides all the information needed to calculate this metric in a very straight forward way. From what I understand, the table was created initially for purposes similar to those in this ticket, that is allowing users to pre-compute information about the cells in experiments/sessions without having to load the full session/experiment. I agree with @aamster that computing the number of cells per experiment from the existing table is not an onerous ask of our users.

DowntonCrabby commented 2 years ago

While I don't think its particularly onerous ask, I do think that especially with the summary tables, we want as low a bar of entry as we can manage.

Many grad students who we would like to use our datasets are used to working with matlab and not python, and so if we start them off by having to do table joins etc. they may feel that the whole dataset will be too complicated to use.

The point of these summary tables is to give folks really quick access to be able to explore the datasets and cells available so they can determine what they to do further analysis on. If we add in hurdles, even very small hurdles I think we risk alienating folks or intimidating them away from the dataset.

My perspective is that it's okay for the summary tables to hold redundant information (same info contained in multiple tables that are just organized slightly differently) if it makes it more user friendly. I know that that goes against some principals of relational databases but I think that's the tradeoff that's worth it to make sure our data is accessible to more people.

morriscb commented 2 years ago

Hey, @DowntonCrabby. So this is in relation to the previous ticket merging information from the session level into the cells table, though it does relate to this request as well. It seemed that @matchings was okay with the the tables being separate and adding something to the tutorial notebooks showing the join. We haven't come to a conclusion yet regarding the number of rois per experiment, but Pika is of a similar stance to the previous issue.

While I can understand this decision will cause some to bounce off, keep in mind that adding things like this adds more complexity to our code, makes the code more difficult for us to maintain, and, potentially, causes less portability of our code between datasets. When we can offload this work onto an external (and very standard) data format/library, we really should. Both the issue you mention and the this ticket are one line operations in Pandas and can either be tutorialized in our notebooks or found in tutorials all over the web.

morriscb commented 2 years ago

I haven't heard any push back on the above statement or that expecting users to calculate the number of ROIs in an experiment from the cells table is too much to ask. Should we close this ticket then?

matchings commented 2 years ago

I think we made our preferences and rationale clear, not sure what else we could say to convince you.

I’d be fine closing it if the online tutorials are first update to demonstrate how to calculate the number of ROIs per experiment.

morriscb commented 2 years ago

Okay, we do have a ticket #2551 explicitly for updating the tutorials. I've added a comment there that this should include a quick demonstration of the calculation described here and merging/join of tables. I'll leave this issue open for now just in case