Closed CBroz1 closed 2 weeks ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Just a general comment from testing. I found in my code several cases where I logged large sections of tables for export due to restriction calls.
ex)
Table & restrt1 & restr2
where restr1
captures many unnecessary entries. I know can rewrite this line to avoid the issue, but identifying these in a users code could be a bit of work.
Would it make sense to have an environment variable such as EXPORT_RESTRICTIONS
and only log the restrictions for exports if set to true?
Edit: Alternatively, we could make a warning requiring user confirmation when restrict-logging a table that results in a large number (e.g. >1000) entries
Would it make sense to have an environment variable such as
EXPORT_RESTRICTIONS
and only log the restrictions for exports if set to true?
If I understand correctly, we would have one flag for turning on logging, and another for turning on logging of restrictions? The latter sound like a different way of achieving Table.restrict(restr, log_export=False)
, which is maybe syntax we want to avoid for the end user
Maybe a prompt?
MyTable & my_restr1 & my_restr2
>> Warning. This logged >1k entries. Proceed? [Yes/No]
>> You may want to change this to Table & dj.AndList([r1, r2])
Ideally, we could find a way around this AND
-> OR
issue. That may be possible by logging the restriction at the point of fetch
, rather than at the point of restriction. I think a plain restriction call will hit fetch
before returning the result, turning the restriction to a list for me
Thanks for your review @samuelbray32. I incorporated your feedback. Let me know if you think the new join handling will work
Unfortunately, I did not see consistent hitting of fetch
for cases like
So this will still bear the AND
-> OR
issue described in documentation, but I would welcome input on a way to only intercept the most restrictive version for logging
Thanks Chris!
At the end of the day, it may just require users adding some manual calls to their tables in their export process to ensure things get hit.
For example, fetches or restrictions after a table join won't get logged anyways since it is now happening on a dj.FreeTable
rather than a SpyglassMixin
object. I think rather than catch every one of these cases, it might make more sense for the user to run an extra script that calls something like (CustomTable & {"nwb_file_name":xxxx}).fetch()
to catch everything they need
@CBroz1
Just noticed while running in docker that the external table "`common_nwbfile`.`~external_analysis`"
isn't getting included in the export. I'm presuming it's not getting caught in the restriction graph since it isn't actually a parent or child of any other table. Will probably require an explicit piece of code to get the externals
Proposal: Since the external table doesn't have dependencies or sensitive data (just hashes and filepaths), could we just dump the whole externals table into the sql file?
Follow up on the external tables:
Here is a way to select just the entries in the external table we want in the export
from spyglass.common.common_nwbfile import schema as common_schema
location="analysis"
filepaths = (Export * Export.File() & paper_key).fetch("file_path")
external_key = []
for path in filepaths:
if not location in path:
continue
external_key.append({"filepath": path.split(location)[1][1:]})
external = common_schema.external['analysis']
external & external_key
We would need to do this for location in ['analysis', 'raw']
Just noticed while running in docker that the external table
"`common_nwbfile`.`~external_analysis`"
isn't getting included in the export. I'm presuming it's not getting caught in the restriction graph since it isn't actually a parent or child of any other table. Will probably require an explicit piece of code to get the externals
That's right! It looks like datajoint's DiGraph
object was built to exclude externals, but I think I'll be able to cascade to it if I manually add it to RestrGraph.graph
. I'll try to add this logic into the cascade_files
logic to avoid extra steps elsewhere
Description
This PR addresses #1144 by
SpyglassMixin
andMerge
I made the decision to add a
mixin
folder with the future intent of breaking upSpyglassMixin
into more manageable pieces.This PR also makes changes to
Merge.delete
to avoid issues with attemptingPart.delete
with the newforce_masters
flag. I remove this flag and run a delete of orphaned master table entriesChecklist:
CITATION.cff
alter
snippet for release notes.CHANGELOG.md
with PR number and description.