elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://www.elementary-data.com/
Apache License 2.0
1.94k stars 165 forks source link

"Too many queries or query is too complex" issue #1689

Open jmainwaring opened 2 months ago

jmainwaring commented 2 months ago

Hello! As seen in this thread, many of us continue to face the following error when run metadata is added in BigQuery:

Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.

I see in the thread that the issue appears to have been resolved, but I and others later in the thread are still encountering this error after upgrading to version 0.16.0. This and this are two other examples of users who appear to be running into the same thing. If you all have any suggestions for config changes we need to make or would be up for discussing a potential fix, that would be greatly appreciated, as there are some wonderful features in the new version that we hope to not miss out on. Thank you!

Environment: Elementary dbt package version: 1.6.0 dbt version you're using: 1.7.4 Data warehouse: BigQuery

cwmuller commented 2 months ago

I'm also experiencing this on 0.16.0, dbt-bigquery 1.8.2. I tried setting the the query_max_size and insert_rows_method parameters, as my impression was that this was an attempt make changes that would alleviate this issue, but I am still getting the same error messages as before.

belle-crisp commented 2 months ago

Same for me!

cwmuller commented 1 month ago

Uploaded log file: dbt.log

haritamar commented 1 month ago

Hi @jmainwaring and all,

Apologies for the delayed response here. This configuration (that was mentioned in on of the threads) should probably solve this issue:

insert_rows_method: chunk
dbt_artifacts_chunk_size: 500

Some extended context about this problem - BQ has two types of limits that affect insert queries:

  1. A limit on the number of rows inserted - this causes the error mentioned above.
  2. A limit on the length of the entire query.

The chunk insert method actually handles both types of limits today, and we'll make it the default in the upcoming package release.

Thanks, Itamar

cwmuller commented 1 month ago

Hi @haritamar ,

The log I posted above is the output from elementary actively failing with the configuration you are mentioning. Unless I have incorrectly set these parameters, they have failed to fix this issue.

Maayan-s commented 1 month ago

@cwmuller @belle-crisp @nickozilla, @benoitgoujon, @wei-wei29, @chenxilll, @kylemcleland, @neethu2409

Let us know if the config @haritamar suggested doesn't solve this 🙏🏻

belle-crisp commented 1 month ago

I have the same experience as @cwmuller

haritamar commented 1 month ago

Hi @cwmuller @belle-crisp , Sorry about that.

@cwmuller another setting that might help is to tweak the query_max_size parameter, not sure if you did that as well. We saw in the log that the failing query has around 240K characters. Have you tried setting query_max_size: 100000?

(I saw it was suggested in one of the slack thread so not sure if you tried that as well)

Thanks, Itamar

haritamar commented 1 month ago

Also if you can share the full list of vars you set up would be great 🙏

cwmuller commented 1 month ago

Uploaded a new log running with the following profiles config:

  elementary:
      insert_rows_method: "chunk"
      dbt_artifacts_chunk_size: 500
      query_max_size: 1000000

Will get back to you on vars.

dbt.log

haritamar commented 1 month ago

Hey @cwmuller , The configuration above is what I meant. But interesting - I think it is actually configured in the wrong place... These should be set as vars in dbt_project.yml, and not under the profiles.

e.g.:

vars:
  insert_rows_method: "chunk"
  dbt_artifacts_chunk_size: 500
  query_max_size: 100000

(I removed one zero from query_max_size)

If you can check if this configurations makes any difference would be great.

cwmuller commented 1 month ago

Hi @haritamar ,

Indeed! I moved the config from to vars, and i was able to run and test the entire project, and elementary models without any failures. I had put the config under elementary, under models. I probably searched for 'elementary' in the dbt_project.yml file and added it underneath without giving it more thought.

Thanks for the assistance!

haritamar commented 1 month ago

That's great to hear @cwmuller ! We'll work on actually incorporating these as defaults in the next version.

nickozilla commented 1 month ago

@haritamar @Maayan-s

Hi, we've added the suggested changes to the dbt_project.yml

vars:
  query_max_size: 100000
  insert_rows_method: "chunk"
  dbt_artifacts_chunk_size: 500

But still see this message on our longer runs (Done. PASS=13380 WARN=48 ERROR=4 SKIP=1305 TOTAL=14737):

on-run-end failed, error:  The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters. 

Do you have any further suggestions to help with this?

haritamar commented 1 month ago

Hi @nickozilla ! Would you mind sharing a dbt.log file as well with this error? Also, what is the Elementary package version you are using?

Itamar

nickozilla commented 1 month ago

Hi @haritamar ,

The log events I have shared are from our logs aggregator tool, so I don't have the dbt.log file accessible currently, I'll see what else I can provide, our elementary version is 0.16.0

Nick

Niteeshkanungo commented 1 month ago

Hello @haritamar I have added this as vars -

elementary limits

dbt_artifacts_chunk_size: 10000 insert_rows_method: chunk query_max_size: 100000

I am getting - The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters. still. DBT on BQ. Do we have a solution for this?

cwmuller commented 1 month ago

We had to reduce the dbt_artifacts_chunk_size to 100 to get the fix to work for our project.

nickozilla commented 1 month ago

@cwmuller what were your other vars set to? & did you also experience this error?

The query is too large. The maximum standard SQL query length is 1024.00K characters, including comments and white space characters.

cwmuller commented 1 month ago

@nickozilla Our settings are:

  elementary:
    insert_rows_method: "chunk"
    dbt_artifacts_chunk_size: 100
    query_max_size: 1000000
joostboon commented 3 weeks ago

We are hoping to close this issue with the proposed solution, but please let me know if you are still experiencing this issue

@benoitgoujon @wei-wei29 @chenxilll @kylemcleland @neethu2409 @belle-crisp

Pgillets commented 3 weeks ago

Adding

vars:
  insert_rows_method: "chunk"
  dbt_artifacts_chunk_size: 1000
  query_max_size: 1000000

to the dbt_project.yaml worked for me.

I'm using Elementary on 0.16.1 and DBT on 1.8.8 (using dbt-bigquery 1.8.3)

joostboon commented 3 weeks ago

Thanks! That is helpful to know

amishas157 commented 1 week ago

Ran into same problem with elementary=0.16.1, dbt=1.8.3 and warehouse as bigquery.

Resolved with following config:

  insert_rows_method: "chunk"
  dbt_artifacts_chunk_size: 100
  query_max_size: 1000000
haritamar commented 3 days ago

Hi everyone! Some updates here.

So after a few iterations here and thanks to a contributor (thanks @Niteeshkanungo !), we actually realized that the "chunk" method had several bugs in it, and in particular the query_max_size limit didn't actually work as expected. Sorry about that. We made a refactor to the insert rows logic which I hope should solve this issue for good (crossing fingers you won't prove me otherwise 🤞 ), and released version 0.16.2 a couple of hours ago.

The insert_rows_method var has been removed (it won't break anything to have it but it will be ignored), and the package will take into account both the chunk size and the query max size limits by default.

Would love if people here can try the new version and comment here how it goes (would be great if you can try it first without any configuration vars, and only if it doesn't work with lower limits as posted above).

Thanks, Itamar

haritamar commented 3 days ago

@Niteeshkanungo @amishas157 @cwmuller @nickozilla @belle-crisp @Pgillets @jmainwaring

nickozilla commented 2 days ago

@haritamar

Seems to have fixed things for us, thanks for investigating. Can you share the PR with the bugfix, out of curiosity?

EDIT: I can also confirm this works for us without any variables set for query_max_size or dbt_artifacts_chunk_size dbt_project.yml

nickozilla commented 1 hour ago

@haritamar

Small update: After some further tests without the query_max_size or dbt_artifacts_chunk_size variables set, I have found that this issue does still occur, we've settled with a stable config of:

  query_max_size: 1000000
  dbt_artifacts_chunk_size: 1000
haritamar commented 1 hour ago

@nickozilla - Thanks for updating! This is helpful. I think we'll lower the defaults for these values then (which are currently 1M for query size and 5000 for chunk size).

Regarding the bugfix PR - here it is Basically what it does is a refactor that unifies the handling of both limits to the same insert method (and as a part of it removing the insert_rows_method configuration).

Technically it was done by adding support for the dbt_artifacts_chunk_size parameter in the code that previously split the queries by query size, and removing the other insert method (which we found out was buggy and missing some features).