TonicAI / condenser

Condenser is a database subsetting tool
https://www.tonic.ai
MIT License
312 stars 48 forks source link

Ignore certain relations #17

Closed janmeier closed 3 years ago

janmeier commented 3 years ago

Hi there :wave:

Thanks for making this tool open source, it seems to work really well!

However: Say I an entity table, with a createdBy column, referencing account. I would like to grab a subset of entity, so i put the following in my config

"initial_targets": [
        {
            "table": "public.entity",
            "percent": 5
        }
    ],

However, I don't want to dump accounts, since they contain PII so I add

"excluded_tables": [ "public.account" ]

However, now the inserts into entity fail, because entity.createdBy references an account which does not exist in my dump.

I am fine with createdBy being set to null on all entities, but I'm not sure how / whether that's possible with this tool

theaeolianmachine commented 3 years ago

Hey Jan, off the top of my head, the best way to handle this is likely with the dependency_breaks config. Normally this would be for breaking a cycle, but what it does is it sets the column entirely to NULL in order to "break the cycle". I think this should do the trick here, but have not looked into the behavior when you're using this outside of a cycle. Of course, the column must be nullable.

Otherwise, is this failing during the subset run? If it's not, you can always add a post script with post_subset_sql. If neither of these work, it'd likely take a code change to enable this behavior correctly for excluded tables. It's notable that with Tonic you could just include the table with it masked, ideally giving more utility for the output database by having the accounts table be present in a masked form.

Hope this helps!