conduitio-labs / conduit-connector-nats-jetstream

Conduit connector for NATS JetStream
Apache License 2.0
0 stars 1 forks source link

Fix flaky tests #138

Open hariso opened 1 month ago

hariso commented 1 month ago

Feature description

Some of the acceptance tests are flaky. A few of them fail more frequently than others, one example is TestAcceptance/TestSource_Open_ResumeAtPositionCDC.

Most of the errors are: err: stop source: unsubscribe: nats: consumer not found, which is logged when a source is stopped. This probably means that a source's client disconnected. We've previously had problems with clients disconnecting which is why reconnections were implemented (see https://github.com/conduitio-labs/conduit-connector-nats-jetstream/issues/62).

Helper scripts: Run all the acceptance tests multiple times (the script runs all the tests 20 times and writes the output to a file):

for i in {1..20}; do
    echo "------------------------------------------------------------" >> test.log
    echo "Attempt $i" >> test.log
    echo "------------------------------------------------------------" >> test.log

    go clean -testcache >> test.log 2>&1

    make test-integration >> test.log 2>&1

done

Run a single test

for i in {1..20}; do
    echo "------------------------------------------------------------" >> test.log
    echo "Attempt $i" >> test.log
    echo "------------------------------------------------------------" >> test.log

    go clean -testcache >> test.log 2>&1

    go test -v -race --tags=integration ./... -run '^TestAcceptance/TestSource_Open_ResumeAtPositionCDC' >> test.log 2>&1

done
hariso commented 1 month ago

Most of the errors are: err: stop source: unsubscribe: nats: consumer not found, which is logged when a source is stopped. This probably means that a source's client disconnected. We've previously had problems with clients disconnecting which is why reconnections were implemented (see #62).

The following could be the case: once the source is done with reading the records and the test is about to end (and the source to be stopped), the NATS client disconnects. The source then needs to be stop, but the client hasn't re-connected yet, which could be the reason why we see the above errors.