Open lorransr opened 1 year ago
I think it should be feasible, I took a look at the code to see how the JdbcCatalog
works. By default we retry SQLNonTransientException
https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/jdbc/JdbcClientPool.java#L46
and the exception that gets thrown in your case is org.postgresql.util.PSQLException
so we don't retry it.
Few things:
1.) SQLNonTransientConnectionException seems to be the wrong thing to even retry. The docs say the following (and it makes sense, it's non-transient, we should only retry transient failures)
/**
* The subclass of {@link SQLException} thrown for the SQLState
* class value '<i>08</i>', or under vendor-specified conditions. This
* indicates that the connection operation that failed will not succeed if
* the operation is retried without the cause of the failure being corrected.
* <p>
* Please consult your driver vendor documentation for the vendor-specified
* conditions for which this <code>Exception</code> may be thrown.
* @since 1.6
*/
2.) we probably want a better mechanism for specifying exception types to retry to ClientPoolImpl rather than just a single class.
I'll let others comment, cc @jackye1995 @rdblue
The serverless db scales up and down during the day and can reach to 0 instances, when it happens, all connections are lost.
As Amogh suggested the code can definitely be updated to accommodate more exception cases for retry. I guess the question is if there is anything more than just org.postgresql.util.PSQLException
, like an error code, that can be used to detect the issue? Using error message like "This connection has been closed" seems to be too specific.
It seems to be related to this code:
Since it is the one used here:
Cool, in that case I think we can add a code change to check that error code and retry. @lorransr unfortunately I am a bit short of capacity, but I can help with review, do you have time to make the code change?
Unfortunnely I don't 😢 And also not sure how
We are encountering the same problem as well.
I end up solving by ditching the serverless approach and provisioning a common database on aws
I should have some time to try and get a fix in for this before next release, I'm assigning to myself. @lorransr @netanelm-upstream
I see the PR didn't make it to a release. Do you need any help with it?
I see the PR didn't make it to a release. Do you need any help with it?
As I stated in the linked PR, I think those codes should be configurable. We don't have much cycles to spare on our side right now, but if you can help, that would be great. We can test your fix easily.
Is there any progress on this issue? I have the same issue.
Query engine
Trino
Question
As mentioned in this issue, we have an aurora serverless database, and we use it as our iceberg catalog. The serverless db scales up and down during the day and can reach to 0 instances, when it happens, all connections are lost. The expected behavior for the jdbc connection in this case would be to try to reconnect to the database, since it will be the trigger for starting the db instances.
Should it be possible to implement this connection retry feature? Or is it expected, the non retry behavior?