apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.51k stars 1.31k forks source link

rare/BlobGranuleRanges.toml hits assertion failure in Ratekeeper code #9200

Closed sfc-gh-jshim closed 1 year ago

sfc-gh-jshim commented 1 year ago

Code location and commit hash: https://github.com/apple/foundationdb/blob/36e8e5a3bb45f3f48da6b41d68597793895eeb09/fdbserver/Ratekeeper.actor.cpp#L1025

Ensemble ID: 20230120-033919-nightly_correctness_main_x86_64-56a90c84c13adea5

Trace snippet:

  <InternalError Severity="40" ErrorKind="BugDetected" Time="884.885138" DateTime="2023-01-20T05:16:23Z" Type="InternalError" Machine="2.1.1.3:1" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" FailedAssertion="!g_network-&gt;isSimulated() || limits-&gt;bwLagTarget != SERVER_KNOBS-&gt;TARGET_BW_LAG || now() &lt; FLOW_KNOBS-&gt;SIM_SPEEDUP_AFTER_SECONDS + SERVER_KNOBS-&gt;BW_RK_SIM_QUIESCE_DELAY" File="/home/jenkins/fdb/extra/long/path/to/work/around/strange/cpack/debug/rpm/behavior/fdbserver/Ratekeeper.actor.cpp" Line="1026" ThreadID="1304401877546049582" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x4328ea1 0x42700ba 0x280e1f6 0x28340cd 0x2827507 0x1a51858 0x420bfe5 0x2d561c7 0x7fbd3a10e555" LogGroup="default" Roles="CP,CS,DD,MS,RK,RV,SS"/>
  <SystemError Severity="40" ErrorKind="Unset" Time="884.885138" DateTime="2023-01-20T05:16:23Z" Type="SystemError" Machine="2.1.1.3:1" ID="0000000000000000" Error="internal_error" ErrorDescription="An internal error occurred" ErrorCode="4100" ThreadID="1304401877546049582" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x4328ea1 0x42706ef 0x42700cd 0x280e1f6 0x28340cd 0x2827507 0x1a51858 0x420bfe5 0x2d561c7 0x7fbd3a10e555" LogGroup="default" Roles="CP,CS,DD,MS,RK,RV,SS"/>
  <Crash Severity="40" ErrorKind="BugDetected" Time="884.885138" DateTime="2023-01-20T05:16:23Z" Type="Crash" Machine="2.1.1.3:1" ID="0000000000000000" Signal="6" Name="Aborted" Trace="addr2line -e fdbserver.debug -p -C -f -i 0x42700cd 0x280e1f6 0x28340cd 0x2827507 0x1a51858 0x420bfe5 0x2d561c7 0x7fbd3a10e555" ThreadID="1304401877546049582" Backtrace="addr2line -e fdbserver.debug -p -C -f -i 0x4328ea1 0x42fea82 0x7fbd3a4c9630 0x42700cd 0x280e1f6 0x28340cd 0x2827507 0x1a51858 0x420bfe5 0x2d561c7 0x7fbd3a10e555" LogGroup="default" Roles="CP,CS,DD,MS,RK,RV,SS"/>
  <WarningLimitExceeded Severity="30" WarningCount="334"/>
  <TestUnexpectedlyNotFinished Severity="40"/>
  <StdErrOutput Severity="40" Output="Assertion !g_network-&gt;isSimulated() || limits-&gt;bwLagTarget != SERVER_KNOBS-&gt;TARGET_BW_LAG || now() &lt; FLOW_KNOBS-&gt;SIM_SPEEDUP_AFTER_SECONDS + SERVER_KNOBS-&gt;BW_RK_SIM_QUIESCE_DELAY failed @ /home/jenkins/fdb/extra/long/path/to/work/around/strange/cpack/debug/rpm/behavior/fdbserver/Ratekeeper.actor.cpp 1026:"/>
  <StdErrOutput Severity="40" Output=" addr2line -e fdbserver.debug -p -C -f -i 0x28340cd 0x2827507 0x1a51858 0x420bfe5 0x2d561c7 0x7fbd3a10e555"/>
  <StdErrOutput Severity="40" Output="SIGNAL: Aborted (6)"/>
  <StdErrOutput Severity="40" Output="Trace: addr2line -e fdbserver.debug -p -C -f -i 0x42700cd 0x280e1f6 0x28340cd 0x2827507 0x1a51858 0x420bfe5 0x2d561c7 0x7fbd3a10e555"/>
</Test>

Assigning to @sfc-gh-jslocum for being a test owner and having modified the assertion lines with commit 9721de70b6aab0e01a47ba8a9f09f1c42cc3ff28

sfc-gh-jslocum commented 1 year ago

This test passes when increasing BW_RK_SIM_QUIESCE_DELAY Given that PR #9052 increases the timeout and makes fixes around blob worker throttling, I'm inclined to just close this, and if a similar error appears after merging, to revisit then.