datacoves / dbt-coves

CLI tool for dbt users to simplify creation of staging models (yml and sql) files
https://pypi.org/project/dbt-coves/
Apache License 2.0
250 stars 15 forks source link

Feature Request: Snowflake: Multithreading for performance? #314

Open jaredx435k2d0 opened 1 year ago

jaredx435k2d0 commented 1 year ago

Is your feature request related to a problem? Please describe. It seems like running generate sources sends the DESCRIBE TABLE ... statements to Snowflake sequentially one-by-one as it goes. It'd be great if this went a lot faster.

Describe the solution you'd like Would it be reasonable to queue up all those database statements up front and run through them as the results return, so that it completes much more quickly?

Describe alternatives you've considered Can't really think of any apart from just using it as it is and waiting much longer.

Additional context python 3.10.9 Snowflake 7.3.1 macOS 13.2 (22D49) output of pip freeze: agate==1.6.3 asn1crypto==1.5.1 attrs==22.2.0 Babel==2.11.0 bump2version==1.0.1 bumpversion==0.6.0 certifi==2022.12.7 cffi==1.15.1 charset-normalizer==2.1.1 click==8.1.3 colorama==0.4.5 commonmark==0.9.1 cryptography==36.0.2 dbt-core==1.3.2 dbt-coves==1.3.0a25 dbt-extractor==0.4.1 dbt-snowflake==1.3.0 filelock==3.9.0 future==0.18.3 hologram==0.0.15 idna==3.4 importlib-metadata==6.0.0 isodate==0.6.1 jaraco.classes==3.2.3 Jinja2==3.1.2 jsonschema==3.2.0 keyring==23.13.1 leather==0.3.4 Logbook==1.5.3 luddite==1.0.2 MarkupSafe==2.1.2 mashumaro==3.0.4 minimal-snowplow-tracker==0.0.2 more-itertools==9.0.0 msgpack==1.0.4 networkx==2.8.8 oscrypto==1.3.0 packaging==21.3 parsedatetime==2.4 pathspec==0.9.0 pretty-errors==1.2.25 prompt-toolkit==3.0.36 pycparser==2.21 pycryptodomex==3.17 pydantic==1.10.4 pyfiglet==0.8.post1 Pygments==2.14.0 PyJWT==2.6.0 pyOpenSSL==22.0.0 pyparsing==3.0.9 pyrsistent==0.19.3 python-dateutil==2.8.2 python-slugify==7.0.0 pytimeparse==1.1.8 pytz==2022.7.1 PyYAML==6.0 questionary==1.10.0 requests==2.28.2 rich==12.6.0 ruamel.yaml==0.17.21 ruamel.yaml.clib==0.2.7 six==1.16.0 snowflake-connector-python==2.7.12 sqlparse==0.4.3 text-unidecode==1.3 typing_extensions==4.4.0 urllib3==1.26.14 wcwidth==0.2.6 Werkzeug==2.2.2 yamlloader==1.2.2 zipp==3.12.0

jaredx435k2d0 commented 1 year ago

@BAntonellini Hey, Bruno. Just wanted to bump here to get thoughts

jaredx435k2d0 commented 1 year ago

dbt-osmosis recently implemented something like this and it helped performance immensely.

Fivetran's Salesforce schema alone, for example, has 776 tables. I have a few other large schemas.

Running dbt-osmosis on multiple schemas / DBs becomes extremely slow (hours).

BAntonellini commented 1 year ago

Hey @jaredx435k2d0

We are aware this would be a good addition to dbt-coves, as it would be beneficial for use-cases like yours.

If you feel like contributing, follow our CONTRIBUTING guide and we will review it.

jaredx435k2d0 commented 1 year ago

Good to have it acknowledged.

If I had the skills, I'd absolutely do this myself.

I'm learning, so maybe if it's not done in a few months I'll take a stab at it.

noel commented 1 year ago

It is a good idea, we just have a lot on our plate and need to prioritize. I dont know many people running this against 776 tables 😱

We can prioritize with some $ :)

We also help our customers out, so consider Datacoves.com