dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
296 stars 176 forks source link

[Bug] `dbt-snowflake` raises an error when caching less than the maximum number of objects in a schema #1234

Closed mikealfare closed 3 days ago

mikealfare commented 2 weeks ago

Is this a new bug in dbt-snowflake?

Current Behavior

I'm seeing this error when running dbt-snowflake on a schema with 12,500 objects:

16:39:51  Encountered an error:
Compilation Error in macro list_relations_without_caching (macros/adapters/metadata.sql)
  dbt will list a maximum of 100000 objects in schema <DATABASE>.<SCHEMA>.
             Your schema exceeds this limit. Please contact support@getdbt.com for troubleshooting tips,
             or review and reduce the number of objects contained.

  > in macro snowflake__get_paginated_relations_array (macros/adapters.sql)
  > called by macro snowflake__list_relations_without_caching (macros/adapters.sql)
  > called by macro list_relations_without_caching (macros/adapters/metadata.sql)
  > called by macro list_relations_without_caching (macros/adapters/metadata.sql)

I can confirm that there are less than 100,000 objects in this schema.

-- use dbt service account role
show objects in <DATABASE>.<SCHEMA>;
result: 12,500

Expected Behavior

I should be able to run dbt-snowflake on a schema with 12,500 objects and the cache should reflect all 12,500 objects with no error.

Steps To Reproduce

  1. Create a project with 12,500 objects in a single schema (or more than 10K really)
  2. Run dbt run on the full project

Relevant log output

No response

Environment

- OS: dbt Cloud
- Python: 3.9
- dbt-core: versionless
- dbt-snowflake: versionless

Additional Context

Initial analysis suggests this is the result of the 2024_07 Snowflake bundle. In particular, it looks like part of this change. We use the object name as a watermark when paginating. This is no longer deterministic, meaning that we could (and very likely do) get duplicates of relations and miss relations because we're picking a random starting point in a randomized list. In the past, this return recordset was ordered, making the pagination deterministic. We need to figure out a new watermark method.