The Connector Protocol implementation in the Python client library, when it gets a NotifyAck message with notify_success=false, does not resend a Notify at a later time. The visible effect is that the client gets "stuck" with an existing TCP connection. The client will remain stuck until the TCP connection is closed by Wallaroo, typically by a crash.
Intermittent CI test failures such as https://circleci.com/gh/WallarooLabs/wallaroo/28539 are due to a race with this feature/bug of the client library versus a conformance test that causes the Python client to re-connect and send its first-and-only Notify attempt for the Stream IDs under test. Wallaroo will always send NotifyAck with notify_success=false when Wallaroo is in the middle of a rollback procedure. The test's disconnect & re-connect are triggered by a Wallaroo rollback, and if the rollback isn't finished before the client's Notify arrives, the test will hang.
What is the expected behavior?
Periodic retries of the Notify message needed in this case. Retries are also needed more generally for the usefulness of the Python client library.
Is this a bug, feature request, or feedback?
Bug/feature
What is the current behavior?
The Connector Protocol implementation in the Python client library, when it gets a NotifyAck message with notify_success=false, does not resend a Notify at a later time. The visible effect is that the client gets "stuck" with an existing TCP connection. The client will remain stuck until the TCP connection is closed by Wallaroo, typically by a crash.
Intermittent CI test failures such as https://circleci.com/gh/WallarooLabs/wallaroo/28539 are due to a race with this feature/bug of the client library versus a conformance test that causes the Python client to re-connect and send its first-and-only Notify attempt for the Stream IDs under test. Wallaroo will always send NotifyAck with notify_success=false when Wallaroo is in the middle of a rollback procedure. The test's disconnect & re-connect are triggered by a Wallaroo rollback, and if the rollback isn't finished before the client's Notify arrives, the test will hang.
What is the expected behavior?
Periodic retries of the Notify message needed in this case. Retries are also needed more generally for the usefulness of the Python client library.