Open andaag opened 1 year ago
But how do you know if some part of your data is already inserted or not? Making retries in a database that does not have transactions is pretty dangerous, you can end up with duplicates. We had a lot of problems until we disabled retries in code. So our rule of thumb when we write code - all tasks should be idempotent, if cancel task in the middle and rerun it multiple times we should have correct data
You understand it correctly. As you noted, the documentation should be more explicit in that retries only refers to connection failures in cases where it's guaranteed that a retry will not insert duplicates. If ClickHouse returns a "retriable" HTTP status code (in particular, 429, 502, or 503) the query will be retried up to the "retries" configuration value.
As @simpl1g suggested, we don't retry database errors. In theory we could -- the difficulty is in determining what ClickHouse errors are retriable. First, ClickHouse doesn't currently return a separate error code -- it has to be parsed from the content of the error message. Second, there are something like 1000 ClickHouse error codes and there's no definitive list of what is "retriable" and what is not.
That said, this is a potential feature request -- which is probably most appropriate for the underlying drivers (clickhouse-connect and clickhouse-driver).
But how do you know if some part of your data is already inserted or not? Making retries in a database that does not have transactions is pretty dangerous, you can end up with duplicates. We had a lot of problems until we disabled retries in code.
By specifically retrying error codes where we know it hasn't been applied. We do this with other services today, and based on the description in the readme that's what I assumed this project did too!
First, ClickHouse doesn't currently return a separate error code -- it has to be parsed from the content of the error message. Second, there are something like 1000 ClickHouse error codes and there's no definitive list of what is "retriable" and what is not.
Probably makes sense to do this on a driver level, and have a whitelist of error codes that are known to be retriable. 👍
Hi, I also think more errors should be retried by DBT.
Im getting ClickHouse exception: The read operation timed out
at the end of our 120 model pipeline that was running for 5 hours. Since there is no per statement retry its restarting the whole thing.
Describe the bug
Based on this description:
I was kinda expecting retries to also retry failures in executing queries. Such as:
But looking at the code it looks like retries is set on retry_connection from SQLConnectionManager, which retries on connection failures, not on query failures.
Is this intended? Am I understanding it wrong? This might be a bit of a mix between either a bug report or a feature request, depending on what the intended behavior is! 😀