MeltanoLabs / tap-snowflake

Other
4 stars 9 forks source link

Schema and table filter arent optimizing discovery #23

Open pnadolny13 opened 1 year ago

pnadolny13 commented 1 year ago

@kgpayne I like the new support for the tables parameter but it looks like its implemented differently than I would expect. Let me know if this was discussed in other SDK issues already but here are my thoughts as a user on what I was expecting to happen:

Expected Behavior

  1. I provide a schema, the tap does discovery only on that schema
  2. I provide a schema and the tables array, then tap does discovery only on that schema and those tables

In 2 if I only include a single table name then I would expect that the tap only queries the metadata of that one table.

Current Behavior

The schema seems to be used only to create a connection but not for filtering using that schema name, so my sync job runs SHOW queries for every schema in my warehouse still. The tables config works as expected but also iterates all schemas/tables in the process even though I've provided a short list of tables to consider. It doesnt discover the schema for every table but it still has to iterate through each one.

Questions and Considerations

  1. Its a little misleading to accept a schema but then not use it for filtering. Whats the purpose of configuring a schema in the current state? Does it have something to do with permissions or credit usage because mine worked the same without that setting?
  2. Was it intentional that we wanted to support syncing data from multiple schemas in the same job? Thats probably more flexible but one advantage of using the schema settings as a filter would be to avoid having to use the fully qualified table name for the tables array. Not a huge benefit but could be a quality of life thing.
  3. If someone provides a tables array wouldnt it be better to flip the logic of the discovery step and search specifically for those tables instead of iterating all available schemas/tables?
pnadolny13 commented 1 year ago

After a second test it looks like it's not repeating the same behavior. I'll do more testing but it seems like maybe it's only doing this when my tables selection is not finding a match.

kgpayne commented 1 year ago

@pnadolny13 any update on this?

nidhi-akkio commented 9 months ago

create a PR to address this: https://github.com/MeltanoLabs/tap-snowflake/pull/36