Closed IanMeta closed 1 week ago
I would like to know the following situations
connection_pool_max_size
to 10000
Is the error reported a new CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10000, maxActive 10000, creating 0, or is the error still CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10, maxActive 10, creating 0. Pay attention to the difference between 10 and 10000. the differenceI would like to know the following situations
- How many queries for external catalog can you have at most at the same time?
- Are you using the alter catalog statement to adjust connection_pool related parameters?
- After adjusting
connection_pool_max_size
to 10000 Is the error reported a new CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10000, maxActive 10000, creating 0, or is the error still CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10, maxActive 10, creating 0. Pay attention to the difference between 10 and 10000. the difference- Are you in a multi-FE environment? When an error occurs, have you paid attention to whether the connected FE is the master FE?
- Have you tried re-creating a catalog and specifying connection_pool_max_size as a larger value, such as 100, in the properties of the created catalog? Then run it for a period of time and observe whether there are the following errors: CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 100, maxActive 100, creating 0
ALTER RESOURCE abc PROPERTIES("connection_pool_max_size"="1000")
. After updating the value to a different one, we were immediately able to query the external table under this resource regardless to the size we set it to.I would like to know the following situations
- How many queries for external catalog can you have at most at the same time?
- Are you using the alter catalog statement to adjust connection_pool related parameters?
- After adjusting
connection_pool_max_size
to 10000 Is the error reported a new CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10000, maxActive 10000, creating 0, or is the error still CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10, maxActive 10, creating 0. Pay attention to the difference between 10 and 10000. the difference- Are you in a multi-FE environment? When an error occurs, have you paid attention to whether the connected FE is the master FE?
- Have you tried re-creating a catalog and specifying connection_pool_max_size as a larger value, such as 100, in the properties of the created catalog? Then run it for a period of time and observe whether there are the following errors: CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 100, maxActive 100, creating 0
- Majority of queries in our production requirements require joins from external tables. In most cases we have no more than 50 queries at the same time.
- Yes. Specifically, we use the statement
ALTER RESOURCE abc PROPERTIES("connection_pool_max_size"="1000")
. After updating the value to a different one, we were immediately able to query the external table under this resource regardless to the size we set it to.- The former. The new error would be CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10000, maxActive 10000.
- Yes, currently we have 3 FEs, all followers. We observed the same error regardless if we were connected to the master node. We also see that the IP reported from the error is always from the BE nodes. Furthermore, we hit the same error when we only had 1 FE.
- Yes. We observed that dropping and re-adding the resource and table has the same effect as using the alter statements.
This seems a bit tricky. I also tested it using the method you mentioned, but I did not reproduce the problem. Is there any way I can contact you so that we can synchronize in time?
I would like to know the following situations
- How many queries for external catalog can you have at most at the same time?
- Are you using the alter catalog statement to adjust connection_pool related parameters?
- After adjusting
connection_pool_max_size
to 10000 Is the error reported a new CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10000, maxActive 10000, creating 0, or is the error still CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10, maxActive 10, creating 0. Pay attention to the difference between 10 and 10000. the difference- Are you in a multi-FE environment? When an error occurs, have you paid attention to whether the connected FE is the master FE?
- Have you tried re-creating a catalog and specifying connection_pool_max_size as a larger value, such as 100, in the properties of the created catalog? Then run it for a period of time and observe whether there are the following errors: CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 100, maxActive 100, creating 0
- Majority of queries in our production requirements require joins from external tables. In most cases we have no more than 50 queries at the same time.
- Yes. Specifically, we use the statement
ALTER RESOURCE abc PROPERTIES("connection_pool_max_size"="1000")
. After updating the value to a different one, we were immediately able to query the external table under this resource regardless to the size we set it to.- The former. The new error would be CAUSED BY: GetConnectionTimeoutException: wait millis 5002, active 10000, maxActive 10000.
- Yes, currently we have 3 FEs, all followers. We observed the same error regardless if we were connected to the master node. We also see that the IP reported from the error is always from the BE nodes. Furthermore, we hit the same error when we only had 1 FE.
- Yes. We observed that dropping and re-adding the resource and table has the same effect as using the alter statements.
This seems a bit tricky. I also tested it using the method you mentioned, but I did not reproduce the problem. Is there any way I can contact you so that we can synchronize in time?
I was kind-of able to reproduce it using a fresh 2.1.1 by spamming with multiple connections, but I had to use different queries, otherwise, it seems to only use 1 connection. The thing after I hit this error, the state of the "active connection count" seemed to be stuck even when i closed the previous connections.
Anyways, we can discuss on 微信 if you'd like, thanks! id: feer4847
I've encountered the same problem and can reproduce it through a demo project, please add me on wechat, id: ziyanTOP @zy-kkk
same problem.
same problem.
Version 2.1.6 fixes all known connection issues. Please upgrade and test. This issue has been unanswered for a long time. I will close it first. If there are still problems later, please feel free to open it.
@zy-kkk Does 2.1.6 resolve https://github.com/apache/doris/issues/34168?
@zy-kkk Does 2.1.6 resolve #34168?
Are you experiencing memory leaks due to using Jdbc Catalog? If so, a fix will be released in 2.1.7
@zy-kkk Does 2.1.6 resolve #34168?
Are you experiencing memory leaks due to using Jdbc Catalog? If so, a fix will be released in 2.1.7
Yes, but we are using JDBC external table, not Catalog. I think the underlying mechanism is the same.
Please let me know which commit fixes this memory leak in 2.1.7, thanks!
Search before asking
Version
Doris Version: 2.1.1 (recently upgraded from 2.0.0) Java Version: 1.8.0_402
What's Wrong?
When querying JDBC external tables in 2.1.1, we reached this error after a while:
What You Expected?
successfully querying the external table.
How to Reproduce?
Anything Else?
Here are the following things we tried:
connection_pool_max_size
, helped to increase the time before we hit this error again, but even increasing the max pool size to 10000, we still eventually hit this error. We suspect that there is a leak and the connection pool management.connection_pool_max_life_time
andjdbc_connection_pool_cache_clear_time_sec
also seemed to help increase the number of queries before we hit the error again, but also do not solve the issue ultimately.connection_pool_max_size
to any value (could be smaller than the previously set value), Doris seems to clear all the useless connections from before and we can immediately query the external table again.We have not tried: Installing Doris 2.1.1 from fresh to reproduce this error, the bug may be a result of unexpected behaviors from the version upgrade from 2.0.0
Are you willing to submit PR?
Code of Conduct