10gen / mongo-orchestration

Apache License 2.0
7 stars 11 forks source link

DRIVERS-2286 Remove transactionLifetimeLimitSeconds default #295

Closed kevinAlbs closed 1 year ago

kevinAlbs commented 1 year ago

Summary

Background & Motivation

Some Range Index specification tests added for DRIVERS-2286 have resulted in errors operation was interrupted because the transaction exceeded the configured 'transactionLifetimeLimitSeconds' in Evergreen. The Range Index operations are expectedly slow. A single operation may do many queries server side. The C# and Go drivers experienced several timeout failures in Evergreen:

3 seconds is the current transactionLifetimeLimitSeconds in mongo-orchestration. The server default transactionLifetimeLimitSeconds is 60. The server tests use 24 hours.

The rationale for decreasing transactionLifetimeLimitSeconds is noted in Why do some tests appear to hang for 60 seconds on a sharded cluster?. SERVER-39726 and SERVER-39349 are resolved in server 4.1.11. If this PR is merged, I will make a PR to update that rationale.

This branch of drivers-evergreen-tools downloads mongo-orchestration with these changes. This patch build of the Go driver tests with these changes.

kevinAlbs commented 1 year ago

It sounds like in some cases some tests can take up to 60 seconds instead the current 3 seconds timing.

Anecdotally, most tests complete within one second. macOS appears to be slower. Though I expect 60 seconds should be much more than enough time.

But it sounds like better behavior rather than skipping fle range tests. LGTM.

I am also considering removing lower value Range Index tests to reduce the runtime. But I do not think that will eliminate the possible timeouts.

ShaneHarvey commented 1 year ago

Yeah reducing the runtime of the tests is ideal as long as we don't loose test coverage. Glad to see that setting transactionLifetimeLimitSeconds can be removed. Once this is merged I'll audit the python tests to make sure there aren't any other tests that inadvertently take 60 seconds (instead of 3).

Running a patch here: https://spruce.mongodb.com/version/63ced886c9ec440caade6531/tasks?sorts=STATUS%3AASC%3BBASE_STATUS%3ADESC

ShaneHarvey commented 1 year ago

My patch here confirms that there's no >=60 second transaction tests caused by this change. I did notice that on 4.0 replica set there's one aggregate $out test that takes 60 seconds but it was already happening before this change:

 [2023/01/23 19:05:00.840]   test_aggregate_out (test_read_concern.TestReadConcern.test_aggregate_out) ... ok (60.288s)