apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.33k stars 1.3k forks source link

6.3 Nightly Test Failure: SwizzledRollbackSideband #4131

Closed sfc-gh-jfu closed 2 years ago

sfc-gh-jfu commented 3 years ago

From nightly ensemble 20201205-062330-nightly_correctness_release-6.3-4513b09bb8fbd56b:

Commit: fa4d87f432117eef1ca2434df818ef8a7f49468b Test File: fast/SwizzledRollbackSideband.txt Buggify: True Seed: 92566661

Updated with some logs of the failure:

  <TLogDegraded Severity="30" DateTime="2020-12-05T09:33:00Z" LogGroup="default" Roles="LR,SS,TL"/>
  <BuggifySection File="/mnt/jenkins/home/jenkins/workspace/FoundationDB-OSFDB-Build/jenkins/foundationdb/fdbserver/OldTLogServer_6_2.actor.cpp" Line="1004"/>
  <BuggifySection File="/mnt/jenkins/home/jenkins/workspace/FoundationDB-OSFDB-Build/jenkins/foundationdb/fdbserver/workloads/RemoveServersSafely.actor.cpp" Line="406"/>
  <TLogDegraded Severity="30" DateTime="2020-12-05T09:33:06Z" LogGroup="default" Roles="TL"/>
  <DisableConnectionFailures_Tester Severity="30" DateTime="2020-12-05T09:33:08Z" LogGroup="default"/>
  <BuggifySection File="/mnt/jenkins/home/jenkins/workspace/FoundationDB-OSFDB-Build/jenkins/foundationdb/fdbserver/DataDistribution.actor.cpp" Line="3991"/>
  <RemoveServersSafelyError Severity="40" DateTime="2020-12-05T10:02:10Z" Error="timed_out" ErrorDescription="Operation timed out" ErrorCode="1004" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x233cbcc 0x233c1e0 0x233c451 0xad8a55 0xad8b7e 0x7cd5f9 0xdeee70 0xbc7a00 0x220f5f8 0x222cac0 0x220f6fd 0x222cbdc 0xbc7a00 0x22e5e34 0x222cd03 0x77cc31 0x7f3528151555" LogGroup="default" Roles="TS"/>
  <TestFailure Severity="40" DateTime="2020-12-05T10:02:10Z" Error="timed_out" ErrorDescription="Operation timed out" ErrorCode="1004" Reason="Error starting workload" Workload="SidebandWorkload;RandomClogging;RollbackWorkload;MachineAttritionWorkload;MachineAttritionWorkload;MachineAttritionWorkload;RemoveServersSafelyWorkload" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x233cbcc 0x233c1e0 0x233c451 0x12d19bc 0x12d1f90 0x7cd5f9 0x81c534 0x7cd5f9 0x7cd5f9 0x182101b 0x7cd5f9 0xad8a6d 0xad8b7e 0x7cd5f9 0xdeee70 0xbc7a00 0x220f5f8 0x222cac0 0x220f6fd 0x222cbdc 0xbc7a00 0x22e5e34 0x222cd03 0x77cc31 0x7f3528151555" LogGroup="default" Roles="TS"/>
  <StartFailedForWorkloadSwizzledCausalConsistencyTest Severity="40" DateTime="2020-12-05T10:02:10Z" Error="operation_failed" ErrorDescription="Operation failed" ErrorCode="1000" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x233cbcc 0x233c1e0 0x233c451 0x12eb46b 0x12d9587 0x12d96bc 0x8587f8 0xad1258 0xb02c69 0xb03025 0x7cd5f9 0x81dadb 0x81dbee 0x7cd5f9 0x7e302d 0x21be7f5 0x21beba5 0xbc7a00 0x220f5f8 0x222cac0 0x220f6fd 0x222cbdc 0xbc7a00 0x22e5e34 0x222cd03 0x77cc31 0x7f3528151555" LogGroup="default"/>
  <RunTests Severity="40" DateTime="2020-12-05T10:02:10Z" Error="operation_failed" ErrorDescription="Operation failed" ErrorCode="1000" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x233cbcc 0x233c1e0 0x233c451 0xad8a55 0xad8b7e 0x7cd5f9 0x12ca81d 0x7cd5f9 0x12cc68d 0x7cd5f9 0x81cd0a 0x12adf39 0x12dca98 0x12dcee1 0x12b4e31 0x12b52e0 0x12b4d21 0x597457 0x8587f8 0xad1258 0xb02c69 0xb03025 0x7cd5f9 0x81dadb 0x81dbee 0x7cd5f9 0x7e302d 0x21be7f5 0x21beba5 0xbc7a00 0x220f5f8 0x222cac0 0x220f6fd 0x222cbdc 0xbc7a00 0x22e5e34 0x222cd03 0x77cc31 0x7f3528151555" LogGroup="default"/>
  <SetupAndRunError Severity="40" DateTime="2020-12-05T10:02:10Z" Error="operation_failed" ErrorDescription="Operation failed" ErrorCode="1000" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x233cbcc 0x233c1e0 0x233c451 0x10e05ad 0x10e06be 0x7cd5f9 0xdeeb90 0x7cd5f9 0x12b86a0 0x7cd5f9 0xad8a6d 0xad8b7e 0x7cd5f9 0x12ca81d 0x7cd5f9 0x12cc68d 0x7cd5f9 0x81cd0a 0x12adf39 0x12dca98 0x12dcee1 0x12b4e31 0x12b52e0 0x12b4d21 0x597457 0x8587f8 0xad1258 0xb02c69 0xb03025 0x7cd5f9 0x81dadb 0x81dbee 0x7cd5f9 0x7e302d 0x21be7f5 0x21beba5 0xbc7a00 0x220f5f8 0x222cac0 0x220f6fd 0x222cbdc 0xbc7a00 0x22e5e34 0x222cd03 0x77cc31 0x7f3528151555" LogGroup="default"/>

Seems to time out on

wait(success(checkForExcludingServers(cx, toKillArray, true /* wait for exclusion */)));

Possibly related to https://github.com/apple/foundationdb/issues/4039

sfc-gh-clin commented 3 years ago

Adding here for tracking the same correctness failure.

On master, ensemble 20201220-051607-nightly_correctness_master-e2c2f10aaa355286

Commit: 0254869dd667f334196d6a3dd7888d2cf175d80d Test File: fast/SwizzledRollbackSideband.txt Buggify: True Seed: 160058930

sfc-gh-abeamon commented 3 years ago

Another possibly related setup and run error:

Source Version: b0f8784bf1c8faa1ce242f4d1ccb4454937c2bf6 (release-6.3) Test: slow/ParallelRestoreNewBackupWriteDuringReadAtomicRestore.txt Seed: 803647060 Buggify: Off

sfc-gh-tclinkenbeard commented 2 years ago

Closing this issue because it is out-of-date.