Open Abacn opened 1 year ago
Related to #25945. CC: @Amraneze
Related to #25945. CC: @Amraneze
I made a workaround to cancel the pipeline because it was running for more than 30 mins even though all messages were published & received. What do you think it would be the best approach to do ?
And I can see in the logs of the failing test that there is an issue with connection.
We have some ghost connections and the JmsIO is creating new connections but for reading not publishing.
JmsIO$UnboundedJmsReader.closeConnection(JmsIO.java:649)
. It feels as connection leak even though the broker is down. If we can just try to reconnect and or force to close the connection and session.
@Amraneze from the log you linked there are lots of connection gets created. This is because the number of DoFn instance can be many in streaming. Defer to connect when first element is received may mitigate the ghost connection issue. Connection pool is a long term solution.
@Amraneze from the log you linked there are lots of connection gets created. This is because the number of DoFn instance can be many in streaming. Defer to connect when first element is received may mitigate the ghost connection issue. Connection pool is a long term solution.
Yeah the instance of the DoFn is created over and over because in the code we throw the exception and DoFn catches it to run TearDown
function. But, I'm not sure if the connection is closed for sure. I'm trying to find time to work on the connection pool in the next few weeks. We also use finalize method which is deprecated and we already call the function doClose
in the overridden close function of UnboundedReader
. I guess it's better to remove it. What do you think ?
I have opened #26179 to see if it works. I tested that the integration test passed locally but on Jenkins it has higher possibility of failure. This may be due to the ci nodes have higher nodes and higher possibility of connection issue
I will run it with gradle until the test fails to see if I can reproduce it.
What happened?
Affecting https://ci-beam.apache.org/view/PostCommit/job/beam_PreCommit_Java_Jms_IO_Direct_Cron/
We should probably decrease the number of element when test is running locally. There is also likely a problem the element not emitted on time.
Issue Failure
Failure: Test is flaky
Issue Priority
Priority: 2 (backlog / disabled test but we think the product is healthy)
Issue Components