AllenNeuralDynamics / dynamic-foraging-task

Bonsai/Harp workflow for Dynamic Foraging with Python GUI for visualization and control
MIT License
5 stars 4 forks source link

Session metadata error – very low success rate #569

Closed hagikent closed 2 months ago

hagikent commented 3 months ago

Related to those issues on individual errors: https://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/490 https://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/491 https://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/493

I have been handing/uploading all data acquired between 20240528-yesterday, that should have rig and session metadata, and realized that session.json generation failure rate seems very high (86/393 >20%). I think this should be fix with high priority. @XX-Yin @alexpiet are you two working on this?

As I'm working on back-filling rig+session for old data (2024 early and 2023 late) (see this issue: https://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/482), I will test if those "failed" session can be easily rescued by similar back-filling operation.

adding @rachelstephlee , who told me that she is happy to help to be familiar with data structure etc.

alexpiet commented 3 months ago

@XX-Yin said he will take care of the session metadata errors.

XX-Yin commented 3 months ago

I can make updates.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Alex Piet @.> Sent: Monday, July 1, 2024 8:59:58 AM To: AllenNeuralDynamics/dynamic-foraging-task @.> Cc: Xinxin Yin @.>; Mention @.> Subject: Re: [AllenNeuralDynamics/dynamic-foraging-task] Session metadata error – very low success rate (Issue #569)

@XX-Yinhttps://github.com/XX-Yin said he will take care of the session metadata errors.

— Reply to this email directly, view it on GitHubhttps://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/569#issuecomment-2200528585, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2CTX5Q54K3QAYPDYQSWD6LZKF4H5AVCNFSM6AAAAABKDBAB6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQGUZDQNJYGU. You are receiving this because you were mentioned.Message ID: @.***>

hagikent commented 3 months ago

@alexpiet @hagikent Started working on backfiling session.json for session that failed to generate metadata. Could you point me to example dialog_metadata_file? What exactly am I supposed to provide in addition to the main json_file to generate_metadata?

XX-Yin commented 3 months ago

I will be out of office in the remaining days of this week. It’s very straightforward and I can show you next week. I will try to make updates on session metadata on weekends if I am still energetic, otherwise will update it next week.

Get Outlook for iOShttps://aka.ms/o0ukef


From: Kenta M. Hagihara @.> Sent: Wednesday, July 3, 2024 12:56:44 AM To: AllenNeuralDynamics/dynamic-foraging-task @.> Cc: Xinxin Yin @.>; Mention @.> Subject: Re: [AllenNeuralDynamics/dynamic-foraging-task] Session metadata error – very low success rate (Issue #569)

@alexpiethttps://github.com/alexpiet @hagikenthttps://github.com/hagikent Started working on backfiling session.json for session that failed to generate metadata. Could you point me to example dialog_metadata_file? What exactly am I supposed to provide in addition to the main json_file to generate_metadata?

— Reply to this email directly, view it on GitHubhttps://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/569#issuecomment-2205339711, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2CTX5SC73CX2IW4P25HKBDZKOVDZAVCNFSM6AAAAABKDBAB6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBVGMZTSNZRGE. You are receiving this because you were mentioned.Message ID: @.***>

alexpiet commented 3 months ago

The code to generate the metadata is confusing. @XX-Yin should be able to help.

The main code generates the metadata by this call: foraging_gui.generate_metadata(Obj=Obj)

So all of the information should be in that dictionary. Which should have a key meta_data_dialog, which gets populated by this call: self.Metadata_dialog.meta_data. Almost all of that information (other than IACUC and Project, I think) are ephys specific

You can see the fields that should be there by looking for that key in a saved session:

MicrosoftTeams-image156ace6c7de971fb2c822d95534b758ff87c22e684c25052d525b4004fa18031

Which is also the fields of the metadata popup window

MicrosoftTeams-image87eedb5fb155b1d3e14a6ec035fee6211b23f19781307baef610ebce6d23d9a4

Let me know if you need more help

hagikent commented 3 months ago

Thanks both. I see both straightforward and confusing 😂

I think I sort of got high-level concept now. I was using/calling foraging_gui.generate_metadata as a standalone script (not as a function) and was confused why info to be ingested into metadata have to be separately fed from two json files. Understandable that most of them are ephys specific.

@XX-Yin I will try this later this week and get back to you if I cannot fully figure out by next week. Have a good 7/4 week.

XX-Yin commented 3 months ago

[like] Xinxin Yin reacted to your message:


From: Kenta M. Hagihara @.> Sent: Wednesday, July 3, 2024 6:13:53 PM To: AllenNeuralDynamics/dynamic-foraging-task @.> Cc: Xinxin Yin @.>; Mention @.> Subject: Re: [AllenNeuralDynamics/dynamic-foraging-task] Session metadata error – very low success rate (Issue #569)

Thanks both. I see both straightforward and confusing 😂

I think I sort of got high-level concept now. I was using/calling foraging_gui.generate_metadata as a standalone script (not as a function) and was confused why info to be ingested into metadata have to be separately fed from two json files. Understandable that most of them are ephys specific.

@XX-Yinhttps://github.com/XX-Yin I will try this later this week and get back to you if I cannot fully figure out by next week. Have a good 7/4 week.

— Reply to this email directly, view it on GitHubhttps://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/569#issuecomment-2206931227, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2CTX5VFUO2KLYA7Y7O5YILZKQ5ODAVCNFSM6AAAAABKDBAB6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBWHEZTCMRSG4. You are receiving this because you were mentioned.Message ID: @.***>

XX-Yin commented 2 months ago

@hagikent

To backfill the session metadata, you can follow steps below. Please let me know if it works.

1) fill out the session metadata dialog. image 2) save the session metadata dialog by clicking the save button in the dialog. The data will be saved to C:\Users\user_name\Documents\ForagingSettings\metadata_dialog by default. 3) run generate_metadata. And you need to provide the behavior json file, dialog_metadata_file saved in the step 2 and the output_folder. One example is shown below: generate_metadata(json_file=r'Y:\715083\behavior_715083_2024-04-26_17-12-15\behavior\715083_2024-04-26_17-12-15.json', dialog_metadata_file=r'C:\Users\xinxin.yin\Documents\ForagingSettings\metadata_dialog\323_EPHYS3_2024-05-13_12-38-51_metadata_dialog.json', output_folder=r'F:\Test\Metadata')

hagikent commented 2 months ago

thanks @XX-Yin I'm trying NOT to use the GUI; I need to reduce manual steps.

The number of sessions missing session.json after 240528- is not so much and manually patching one-by-one is likely fine. However, I need to backfill metadata for data from late2023 and early2024 and I cannot do this manual steps for all of them.

Let me try differently and get back to you if need some help/clarification.

XX-Yin commented 2 months ago

thanks @XX-Yin I'm trying NOT to use the GUI; I need to reduce manual steps.

The number of sessions missing session.json after 240528- is not so much and manually patching one-by-one is likely fine. However, I need to backfill metadata for data from late2023 and early2024 and I cannot do this manual steps for all of them.

Let me try differently and get back to you if need some help/clarification.

You don't need to fill out all fields in the session metadata dialog. Probes, Left lick sprout reference positions, Stick microscopes and Protocols.io ID are optional.

hagikent commented 2 months ago

I made it work without using the GUI to generate the metadata_dialog file.

I sort of agree with @alexpiet saying that this is confusing – information to be ingested into rig.json and sessions.json are intermingled/spread around in both dialog.json and the main-json (nnnnnn_YYMMDDTTTTTjson). And most confusing part to me was that the main-json has Metada_dialog key; very intertwined. I suppose this is because of some historical/development-trajectory reasons but not very straightforward for those who didn't know the history. Longer-term, this should be refactored by incorporating AIND-mapper.

For now, I made a minimal generic dialog.json stab, and a snippet to generate only session.json. By this, you only need to feed the main-json to produce sessions.json, most practical converter function for the backfilling purpose.

Will execute the backfilling process and upload remaining data (2023late, 2024early, session-missing-after20240528) by the end of this week.

XX-Yin commented 2 months ago

@hagikent Glad to see that you can backfill some old sessions. Can you explain a little more how do you make it?

I can image you only need to provide fields highlighted below to generate the session metadata, and don't need to care about the ephys part. These fields should be consistent if you don't change the rig metadata, which means you only need to do once. image

I understand that the code is a little confusing that is determined by the complex nature of session metadata itself. During the design process, modularity was considered as a major requirement. The behavior json, rig metadata, and metadata dialog are largely independent in the code. However, they need share information to generate the session metadata. For example, the session metadata requires a lot of fields from the behavior json; the ephys part, laser parts, daqs etc of the session metadata require rig metadata to keep naming consistent; the metadata dialog requires the rig metadata to parse the probes and the stick microscope. Their integration is inevitable and necessary to automate the entire process and reduce human intervention.

However, please keep in mind that you can use the independent class GenerateMetadata to flexibly integrate them. The metadata dialog is designed to capture fields not directly provided by the behavior json. So, you only need to provide metadata dialog and behavior json to GenerateMetadata to generate the session metadata. The metadata dialog (if provided) is saved to the behavior json (just one dictionary called meta_data_dialog), which has multiple advantages. 1) We can visualize these metadata fields when we load a behavior json. 2) Behavior json and metadata dialog can communicate with each other through a shared data format. 3) If we want to regenerate the session metadata when we update the code, we only need the behavior json as it contains all information (If we add some fields that are not contained in the behavior json, we need to add this field to the metadata dialog).

Please let me know if you need help to generate the session metadata, or if you find specific parts that are confusing so we can make updates.

XX-Yin commented 2 months ago

By the way, my understanding is AIND-mapper is more for the rig metadata than the session metadata. The exact form of AIND-mapper is not clear. I've been advocating for standardizing "rig metadata", but it seems challenging (one discussion is here https://github.com/AllenNeuralDynamics/dynamic-foraging-task/discussions/460).

hagikent commented 2 months ago

https://github.com/AllenNeuralDynamics/aind-metadata-mapper

No, mapper is for both session and rig.

XX-Yin commented 2 months ago

It's a collection from many projects. Not sure if we have a general framework to make session metadata reusable across projects

Get Outlook for iOShttps://aka.ms/o0ukef


From: Kenta M. Hagihara @.> Sent: Tuesday, July 9, 2024 11:50:39 PM To: AllenNeuralDynamics/dynamic-foraging-task @.> Cc: Xinxin Yin @.>; Mention @.> Subject: Re: [AllenNeuralDynamics/dynamic-foraging-task] Session metadata error – very low success rate (Issue #569)

https://github.com/AllenNeuralDynamics/aind-metadata-mapper

No, mapper is for both session and rig.

— Reply to this email directly, view it on GitHubhttps://github.com/AllenNeuralDynamics/dynamic-foraging-task/issues/569#issuecomment-2219695077, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A2CTX5UXJHISR5QYELXJV3TZLTKT7AVCNFSM6AAAAABKDBAB6CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJZGY4TKMBXG4. You are receiving this because you were mentioned.Message ID: @.***>

hagikent commented 2 months ago

That's not the point... the idea of externalizing the mapper function is to catch up with the evolving metadata scheme versions. Our current metadata generation functions are based on a fixed version that is already old and will soon get very obsolete.

Anyway, backfilling is done. You can close this thread if the session metadata failure problem is solved.

XX-Yin commented 2 months ago

How does the mapper handle the version of the schema? Some changes of the schema require change the code for generating session metadata (For example, they changed the folder structure in the schema and the code in the session metadata needed to be changed accordingly).

Still working on other parts.