catalystneuro / buzsaki-lab-to-nwb

BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Log project structural information #57

Closed garrettmflynn closed 1 year ago

garrettmflynn commented 1 year ago

Added some cells in the notebook created for #56 to derive structural information about the project.

This searches the last folders (e.g session folders) in an arbitrary root folder for their files, then allows you to output information organized by unique file types (e.g. cell_metrics.cellinfo from xxx.cell_metrics.cellinfo.mat).

Based on this, you can refer to the outputted _json_files/project/data_inconsistencies.json file to see which files are missing data that are present in other project files that match its type. This simply compares the keys between each file's build_keys_and_types output between sessions—though this could be extended to validate the consistency of the actual (meta)data values.

Note: The data.json file that this depends on takes a while to be generated. However, subsequent code will refer to the saved JSON file so we don't need to rerun this until we change the root scope.

Let me know if there're any changes to the output files that would improve their usefulness—or whether an alternative approach is warranted.

CodyCBakerPhD commented 1 year ago

Can you send me the resulting .json files? just curious what they look like

garrettmflynn commented 1 year ago

Here you go: project_info.zip

h-mayorquin commented 1 year ago

Nice. I briefly checked the files that you shared with @CodyCBakerPhD. In my view, the most important information is whether any subject/session lacks any of the files in the others. Do I read correctly missing_files.json as indicating this is not the case?

Also, did you do this for dates other than e13? That's the only mention I see in the files.

h-mayorquin commented 1 year ago

Also, data_inconsistencies.json is tracking whether some of the fields / cells / specific values in the matlab files are missing, right? This willl be useful later, once we decide what are the important fields.

garrettmflynn commented 1 year ago

Nice. I briefly checked the files that you shared with @CodyCBakerPhD. In my view, the most important information is whether any subject/session lacks any of the files in the others. Do I read correctly missing_files.json as indicating this is not the case?

Also, did you do this for dates other than e13? That's the only mention I see in the files.

Yep you've got the right idea there. Same for the data inconsistencies file.

I've only brought sessions from e13/e13_16f1 onto my system so far, so that's why they're the only ones found.

One question I have is: What about the files with the leading dot. I did not see that in the session that I checked. Do they have the same information?

I manually filtered these files out since they're hidden on my system (Mac). It looks like these are unreadable by loadmat_scipy. Did you try reading any of them yourself?