Closed DowntonCrabby closed 2 years ago
Related to #2543
Hey everyone, so summarize Pika's position on this ticket: we feel that the ophys_cells_table
provides all the information needed to calculate this metric in a very straight forward way. From what I understand, the table was created initially for purposes similar to those in this ticket, that is allowing users to pre-compute information about the cells in experiments/sessions without having to load the full session/experiment. I agree with @aamster that computing the number of cells per experiment from the existing table is not an onerous ask of our users.
While I don't think its particularly onerous ask, I do think that especially with the summary tables, we want as low a bar of entry as we can manage.
Many grad students who we would like to use our datasets are used to working with matlab and not python, and so if we start them off by having to do table joins etc. they may feel that the whole dataset will be too complicated to use.
The point of these summary tables is to give folks really quick access to be able to explore the datasets and cells available so they can determine what they to do further analysis on. If we add in hurdles, even very small hurdles I think we risk alienating folks or intimidating them away from the dataset.
My perspective is that it's okay for the summary tables to hold redundant information (same info contained in multiple tables that are just organized slightly differently) if it makes it more user friendly. I know that that goes against some principals of relational databases but I think that's the tradeoff that's worth it to make sure our data is accessible to more people.
Hey, @DowntonCrabby. So this is in relation to the previous ticket merging information from the session level into the cells table, though it does relate to this request as well. It seemed that @matchings was okay with the the tables being separate and adding something to the tutorial notebooks showing the join. We haven't come to a conclusion yet regarding the number of rois per experiment, but Pika is of a similar stance to the previous issue.
While I can understand this decision will cause some to bounce off, keep in mind that adding things like this adds more complexity to our code, makes the code more difficult for us to maintain, and, potentially, causes less portability of our code between datasets. When we can offload this work onto an external (and very standard) data format/library, we really should. Both the issue you mention and the this ticket are one line operations in Pandas and can either be tutorialized in our notebooks or found in tutorials all over the web.
I haven't heard any push back on the above statement or that expecting users to calculate the number of ROIs in an experiment from the cells table is too much to ask. Should we close this ticket then?
I think we made our preferences and rationale clear, not sure what else we could say to convince you.
I’d be fine closing it if the online tutorials are first update to demonstrate how to calculate the number of ROIs per experiment.
Okay, we do have a ticket #2551 explicitly for updating the tutorials. I've added a comment there that this should include a quick demonstration of the calculation described here and merging/join of tables. I'll leave this issue open for now just in case
Describe the use case that is addressed by this feature. When external users are selecting ophys experiments of interest to analyze it would be really useful for them to know up front how many valid cells/ROIs are associated with each experiment.
Describe the solution you'd like Add number of valid cells/ROIs for each experiment to the
ophys_experiments_table
(cache/manifest table)Additional context This will likely need to be calculated before it can be added as I don't think this exists anywhere currently.
Do you want to work on this issue? I'm happy to create a function that will quickly calculate the number of valid ROIS