LorenFrankLab / spyglass

Neuroscience data analysis framework for reproducible research built by Loren Frank Lab at UCSF
https://lorenfranklab.github.io/spyglass/
MIT License
94 stars 43 forks source link

Expand Export logging abilities #1164

Closed CBroz1 closed 2 weeks ago

CBroz1 commented 1 month ago

Description

This PR addresses #1144 by

I made the decision to add a mixin folder with the future intent of breaking up SpyglassMixin into more manageable pieces.

This PR also makes changes to Merge.delete to avoid issues with attempting Part.delete with the new force_masters flag. I remove this flag and run a delete of orphaned master table entries

Checklist:

review-notebook-app[bot] commented 1 month ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

samuelbray32 commented 4 weeks ago

Just a general comment from testing. I found in my code several cases where I logged large sections of tables for export due to restriction calls.

ex) Table & restrt1 & restr2 where restr1 captures many unnecessary entries. I know can rewrite this line to avoid the issue, but identifying these in a users code could be a bit of work.

Would it make sense to have an environment variable such as EXPORT_RESTRICTIONS and only log the restrictions for exports if set to true?

Edit: Alternatively, we could make a warning requiring user confirmation when restrict-logging a table that results in a large number (e.g. >1000) entries

CBroz1 commented 4 weeks ago

Would it make sense to have an environment variable such as EXPORT_RESTRICTIONS and only log the restrictions for exports if set to true?

If I understand correctly, we would have one flag for turning on logging, and another for turning on logging of restrictions? The latter sound like a different way of achieving Table.restrict(restr, log_export=False), which is maybe syntax we want to avoid for the end user

Maybe a prompt?

MyTable & my_restr1 & my_restr2
>> Warning. This logged >1k entries. Proceed? [Yes/No]
>> You may want to change this to Table & dj.AndList([r1, r2])

Ideally, we could find a way around this AND -> OR issue. That may be possible by logging the restriction at the point of fetch, rather than at the point of restriction. I think a plain restriction call will hit fetch before returning the result, turning the restriction to a list for me

CBroz1 commented 4 weeks ago

Thanks for your review @samuelbray32. I incorporated your feedback. Let me know if you think the new join handling will work

Unfortunately, I did not see consistent hitting of fetch for cases like

So this will still bear the AND -> OR issue described in documentation, but I would welcome input on a way to only intercept the most restrictive version for logging

samuelbray32 commented 4 weeks ago

Thanks Chris!

At the end of the day, it may just require users adding some manual calls to their tables in their export process to ensure things get hit.

For example, fetches or restrictions after a table join won't get logged anyways since it is now happening on a dj.FreeTable rather than a SpyglassMixin object. I think rather than catch every one of these cases, it might make more sense for the user to run an extra script that calls something like (CustomTable & {"nwb_file_name":xxxx}).fetch() to catch everything they need

samuelbray32 commented 3 weeks ago

@CBroz1

Just noticed while running in docker that the external table "`common_nwbfile`.`~external_analysis`" isn't getting included in the export. I'm presuming it's not getting caught in the restriction graph since it isn't actually a parent or child of any other table. Will probably require an explicit piece of code to get the externals

Proposal: Since the external table doesn't have dependencies or sensitive data (just hashes and filepaths), could we just dump the whole externals table into the sql file?

samuelbray32 commented 3 weeks ago

Follow up on the external tables:

Here is a way to select just the entries in the external table we want in the export

from spyglass.common.common_nwbfile import schema as common_schema
location="analysis"
filepaths = (Export * Export.File() & paper_key).fetch("file_path")
external_key = []
for path in filepaths:
    if not location in path:
        continue
    external_key.append({"filepath": path.split(location)[1][1:]})
external = common_schema.external['analysis']
external & external_key

We would need to do this for location in ['analysis', 'raw']

CBroz1 commented 3 weeks ago

Just noticed while running in docker that the external table "`common_nwbfile`.`~external_analysis`" isn't getting included in the export. I'm presuming it's not getting caught in the restriction graph since it isn't actually a parent or child of any other table. Will probably require an explicit piece of code to get the externals

That's right! It looks like datajoint's DiGraph object was built to exclude externals, but I think I'll be able to cascade to it if I manually add it to RestrGraph.graph. I'll try to add this logic into the cascade_files logic to avoid extra steps elsewhere