AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
343 stars 149 forks source link

Parse key stimulus & imaging modality features and record them individually #2591

Open DowntonCrabby opened 1 year ago

DowntonCrabby commented 1 year ago

Describe the use case that is addressed by this feature. Currently there are several discrete stimulus and imaging modality features that are all encoded in either the stimulus name or the project id. This can be quite complicated and confusing for external users who then have to look up what each project code means or what set of images are associated with a given stimulus name. We should include these as fields in the metadata table as well as columns that are then filterable in the cache summary tables (session, experiment, behavior session etc.)

Describe the solution you'd like The following items to be individual fields:

Additional context There might be other fields that could be added here and I'd also like @matchings and @corbennett to weigh in on the specific language / column names and definitions. I believe a few of these have also already been implemented for the visual behavior neuropixels dataset and are available

Do you want to work on this issue? I am happy to help work on this issue

mattjdavis commented 1 year ago

I think visual_areas and visual_area_planes could be more generic as we are going to image additional regions (such as frontal)

matchings commented 1 year ago

To agree with @mattjdavis here - i would go with something like num_targeted_structures and num_imaging_planes instead of visual_areas and visual_area_planes, since we will be including non-visual areas in the future.

I also wonder whether we should use stimulus_set instead of image_set because not all stimuli that we currently use or will use in the future are technically "images". For example, some people would say that gratings are not images. But that's mostly semantics.

morriscb commented 1 year ago

Discussing this in our backlog refinement: In order for Pika to work on this ticket we need specifics examples of where these data stored and how to parse them. For example how does the session type TRAINING_5_images_A_handoff_lapsed map on what is requested in this ticket. Additionally, as mentioned in the request, how do some of the requested data parse from the project_code.

morriscb commented 1 year ago

Hi @matchings and @DowntonCrabby we are still waiting on an answer to our above question for us to being working on this ticket.

DowntonCrabby commented 1 year ago

Hi @morriscb, I'll get started on the mapping right now and provide it as a document here and ping you as soon as it's complete

DowntonCrabby commented 1 year ago

Also of note: while some of this is just back mapping what this information was in the past, we might want to get engineering or others to include this information somewhere in the pkl file (or elsewhere, I'm not sure where) to make it much more easily parse-able

DowntonCrabby commented 1 year ago

@matchings @mattjdavis Can you both take a look at this document I quickly put together with the mappings and make sure the titles of columns make sense and we have all the information we need?

There are 2 tabs in the excel sheet- one which parses the stimulus, and the other that parses project code

matchings commented 1 year ago

@morriscb we are working on finalizing a table with the mapping of stimulus_name and project_code to their relevant defining features, but it’s a little tricky given the variety of stimulus and project types. @DowntonCrabby has a draft version and we can finalize and share with you soon.

After re-reading this issue, considering the upcoming VBO re-release logistics, and the format of the table with the mappings, I am wondering if we should just include this table (or tables) as an additional metadata table, and not fully embed the information into all the NWB files. I think this would be much easier and should be sufficient for users. But I’d like to know what @DowntonCrabby thinks before deciding anything.

morriscb commented 1 year ago

Hey @matchings chatted with the other Pika's. We're not thrilled about idea of uploading the table as metadata. Likely future projects are gonna want a similar mapping so it might be best to port the logic/code that was used to produce the @DowntonCrabby 's document python for the SDK. Is there such a piece of code/logic we can use? I think I can make out a general sense of what it is but if one already exists we could re-use that rather than recreate and potentially miss something.

DowntonCrabby commented 1 year ago

@morriscb I am working on some functions right now that will take in the session_type or project_id and provide those mappings that are in the excel sheet.

@matchings @corbennett, Marina and I went over these mappings and column naming a while back but we wanted to come up with something that was going to work for both ophys and neuropixels so I wanted to check in with you guys about the naming and the different levels & organization of parsing out information. I still want to get ya'lls feedback on that before making it final!

morriscb commented 1 year ago

Hi folks, Clark is out of office so I'd figure I'd ask the two commenters here. When adding the columns that Clark mentions above, what should I do for behavior only sessions? Specifically, I'm not sure that visual_areas and visual_area_planes make sense for these sessions (If I've missed another one let me know). My plan is to set the values for these columns to None or N/A for behavior only sessions. What do you think @matchings and @mattjdavis?

mattjdavis commented 1 year ago

None is great for me. Another optional key could be a behavior_only boolean. But a None in the other columns is just as clear.

On the subject of key names.... I think there is still some ambiguity in the key naming.

visual_areas could (should?) be targeted_areas.

Similarly, visual_area_planes should be more generic like num_depths_per_area

maybe overthinking this @matchings thoughts?

morriscb commented 1 year ago

Okay, I'll run with targeted_areas and num_dpeths_per_area for now until @matchings weights in. I'll also double check @DowntonCrabby's sheet regarding the other names.

matchings commented 1 year ago

None makes sense to me too

morriscb commented 1 year ago

Cool, I think it will be shown as N/A thanks to the pandas typing (pandas converts None to NaN in columns which causes the type to go from int -> float if you aren't careful). The types of these two columns will be pandas.Int64. Any opinions on the columns names @matchings?

morriscb commented 1 year ago

Hi all, I've updated the session/experiment metadata with the new requested columns. You can find them on Isilon at /allen/programs/braintv/workgroups/nc-ophys/visual_behavior/vbo_2023_release_candidate/project_metadata/ the new columns are 'behavior_type', 'image_set', 'image_set_novelty', targeted_areas, and num_dpeths_per_area. There is already a column prior_exposures_to_image_set which I assume is equivalent to the requested image_set_exposure.