Open DowntonCrabby opened 1 year ago
I think visual_areas and visual_area_planes could be more generic as we are going to image additional regions (such as frontal)
To agree with @mattjdavis here - i would go with something like num_targeted_structures
and num_imaging_planes
instead of visual_areas
and visual_area_planes
, since we will be including non-visual areas in the future.
I also wonder whether we should use stimulus_set
instead of image_set
because not all stimuli that we currently use or will use in the future are technically "images". For example, some people would say that gratings are not images. But that's mostly semantics.
Discussing this in our backlog refinement: In order for Pika to work on this ticket we need specifics examples of where these data stored and how to parse them. For example how does the session type TRAINING_5_images_A_handoff_lapsed
map on what is requested in this ticket. Additionally, as mentioned in the request, how do some of the requested data parse from the project_code
.
Hi @matchings and @DowntonCrabby we are still waiting on an answer to our above question for us to being working on this ticket.
Hi @morriscb, I'll get started on the mapping right now and provide it as a document here and ping you as soon as it's complete
Also of note: while some of this is just back mapping what this information was in the past, we might want to get engineering or others to include this information somewhere in the pkl file (or elsewhere, I'm not sure where) to make it much more easily parse-able
@matchings @mattjdavis Can you both take a look at this document I quickly put together with the mappings and make sure the titles of columns make sense and we have all the information we need?
There are 2 tabs in the excel sheet- one which parses the stimulus, and the other that parses project code
@morriscb we are working on finalizing a table with the mapping of stimulus_name and project_code to their relevant defining features, but it’s a little tricky given the variety of stimulus and project types. @DowntonCrabby has a draft version and we can finalize and share with you soon.
After re-reading this issue, considering the upcoming VBO re-release logistics, and the format of the table with the mappings, I am wondering if we should just include this table (or tables) as an additional metadata table, and not fully embed the information into all the NWB files. I think this would be much easier and should be sufficient for users. But I’d like to know what @DowntonCrabby thinks before deciding anything.
Hey @matchings chatted with the other Pika's. We're not thrilled about idea of uploading the table as metadata. Likely future projects are gonna want a similar mapping so it might be best to port the logic/code that was used to produce the @DowntonCrabby 's document python for the SDK. Is there such a piece of code/logic we can use? I think I can make out a general sense of what it is but if one already exists we could re-use that rather than recreate and potentially miss something.
@morriscb I am working on some functions right now that will take in the session_type or project_id and provide those mappings that are in the excel sheet.
@matchings @corbennett, Marina and I went over these mappings and column naming a while back but we wanted to come up with something that was going to work for both ophys and neuropixels so I wanted to check in with you guys about the naming and the different levels & organization of parsing out information. I still want to get ya'lls feedback on that before making it final!
Hi folks, Clark is out of office so I'd figure I'd ask the two commenters here. When adding the columns that Clark mentions above, what should I do for behavior only sessions? Specifically, I'm not sure that visual_areas
and visual_area_planes
make sense for these sessions (If I've missed another one let me know). My plan is to set the values for these columns to None or N/A for behavior only sessions. What do you think @matchings and @mattjdavis?
None
is great for me. Another optional key could be a behavior_only
boolean. But a None in the other columns is just as clear.
On the subject of key names.... I think there is still some ambiguity in the key naming.
visual_areas
could (should?) be targeted_areas
.
num_targeted_structures
was suggested above, but sometimes we record fields of view across different structures or within the same structure. For example, learningmfishtask1a, we targeted 4 structures initially (v1, al, am, lm), but now we are targeting two structures (v1, am) but are still recording from 4 areas (2 areas per structure(num_cortical_structures
in the excel, which has the same ambiguity.Similarly, visual_area_planes
should be more generic like num_depths_per_area
maybe overthinking this @matchings thoughts?
Okay, I'll run with targeted_areas
and num_dpeths_per_area
for now until @matchings weights in. I'll also double check @DowntonCrabby's sheet regarding the other names.
None
makes sense to me too
Cool, I think it will be shown as N/A
thanks to the pandas typing (pandas converts None to NaN in columns which causes the type to go from int -> float if you aren't careful). The types of these two columns will be pandas.Int64
. Any opinions on the columns names @matchings?
Hi all, I've updated the session/experiment metadata with the new requested columns. You can find them on Isilon at /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/vbo_2023_release_candidate/project_metadata/
the new columns are 'behavior_type', 'image_set', 'image_set_novelty', targeted_areas
, and num_dpeths_per_area
. There is already a column prior_exposures_to_image_set
which I assume is equivalent to the requested image_set_exposure
.
Describe the use case that is addressed by this feature. Currently there are several discrete stimulus and imaging modality features that are all encoded in either the stimulus name or the project id. This can be quite complicated and confusing for external users who then have to look up what each project code means or what set of images are associated with a given stimulus name. We should include these as fields in the metadata table as well as columns that are then filterable in the cache summary tables (session, experiment, behavior session etc.)
Describe the solution you'd like The following items to be individual fields:
Additional context There might be other fields that could be added here and I'd also like @matchings and @corbennett to weigh in on the specific language / column names and definitions. I believe a few of these have also already been implemented for the visual behavior neuropixels dataset and are available
Do you want to work on this issue? I am happy to help work on this issue