AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
343 stars 149 forks source link

Package add on Eye Tracking data for Saskia. #2608

Open morriscb opened 1 year ago

morriscb commented 1 year ago

Saskia has add on data to the VCO release that needs to be packaged and released publicly. From reading through and email thread (attached below), Pika's initial plan is to create a getter method on the BrainObservatoryCache that will retrieve the data from an S3 bucket instead of through a released NWB file. This is similar to how the current L0 events works.

morriscb commented 1 year ago

Emails in order:

Hi Scott,

I’ve been finishing up some Mindscope work using the eye tracking data for the Visual Coding 2P dataset, and we’re getting ready to submit a manuscript on this work. This uses the DLC eye tracking data, that was derived a few years ago. So this data is not in the NWB files or available through the SDK. When we publish this work, we need to make the eye tracking data available, and I’d like to talk to you about how to do this. We could simply make files of the eye tracking data and put them in a general repository somewhere, which will meet the requirements for publishing. However, it is a bad look having different pieces of the Allen Brain Observatory data in random places. It would be much better to have the eye tracking data integrated into the SDK. I think ideally, it would be added to the NWB files, but that seems unlikely to happen anytime soon. I propose we do something similar to what was done for the L0 events for the visual coding dataset – where the data lives outside the NWB files, but is accessible via the SDK. But this would require that you create that SDK functionality. We can definitely help with this – eg. putting the aligned eyetracking data into a npy file or something like that. Functions to read it should be straightforward, and we can help on that too.

Let me know what you think and how you’d like to proceed,

Saskia

Hi Saskia,

I am not as familiar with the Visual Coding release as I ought to be. Can you share with me a code snippet for accessing the L0 events data with the SDK so that I can see how that was put together?

I’ll let you know what I think after I’ve seen what we did before.

Thanks,

Scott

So for everything else, we first access the NWB file, and then we have get_ functions to retrieve the traces or stimulus tables or other things that are contained in the NWB file:

boc = BrainObservatoryCache(manifest_file=manifest_file) data_set = boc.get_ophys_experiment_data(ophys_experiment_id=session_id) ts, dff = data_set.get_dff_traces() stim_epoch = data_set.get_stimulus_epoch_table() Etc…

For the L0 events, they aren’t contained in the NWB file, so we access them from the cache:

events = boc.get_ophys_experiment_events(ophys_experiment_id=session_id)

This returns an array that is the same shape as the dff array above, meaning that the events are temporally aligned with the rest of the data.

Let me know if this doesn’t make sense

Hi Saskia,

Thanks. That helps a lot. To be clear: whatever functionality we end up adding will need to be added to this BrainObservatoryCache class?

Cheers,

Scott

Yes, there will need to be a function that reads in the eye tracking data. I don’t expect it will need to be complicated because we already have the data aligned – so it can probably be a pretty easy numpy read function.

Hi Saskia,

(Matt, the full discussion of what is being asked for should be below)

I’m looping Matt Sullivan is since he’s leading the Project Management Team and actually has the strategic view necessary to speak to where/how this work gets prioritized in Pika’s existing backlog of work.

My technical assessment is: yes, what you describe is possible.

Slightly more detailed answer: In 2021, Pika developed and AWS-backed infrastructure for data releases. This obviates the need to work through the old brain-map.org API that is actually owned by Central IT and (so I am told) requires some complicated cross-team gymnastics to actually put in play. Our plan would be to publish the .npz (or whatever) files you provide us on S3 using the new infrastructure. We will then add wrapping functions to the existing BrainObservatoryCache so that, if users ask for the eye tracking data, the SDK will download it from S3. Technically, this means the new data will not live on the same hard drives as the old data (S3 versus brain-map.org) but the user will not know that. The user will just download the data seamlessly from the SDK.

Once we have the data in hand, I would estimate that a single Pika engineer could get this done in a week (I haven’t but a lot of thought into that estimate, but I think it’s reasonable).

Determining when that week occurs is above my pay grade, hence your involvement, Matt.

Cheers,

Scott

Hi all,

Just circling back to this. The data is ready in this location:

https://alleninstitute-my.sharepoint.com/personal/chase_king_alleninstitute_org1/_layouts/15/onedrive.aspx?ga=1&id=%2Fpersonal%2Fchase%5Fking%5Falleninstitute%5Forg1%2FDocuments%2Fpackaged%5Fviscoding%5Feye%5Ftracking%2Fdata

let me know if you need more from me to start working on this.

Thanks, Saskia