Closed kibanamachine closed 1 month ago
Pinging @elastic/kibana-security (Team:Security)
New failure: CI Build - main
Duplicate of https://github.com/elastic/kibana/issues/149090
New failure: CI Build - 8.8
New failure: CI Build - 8.8
Ran another flaky test runner just to be sure, but this looks tied to a series of CI failures on Friday.
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
/skip
Skipped
main: c978ef954ea68bbd5ff0b24ee7707d72b28069ee
New failure: CI Build - main
debg Log in as test_user and SAML user 3 times each with a 0.5s delay.
proc [kibana] [2023-12-26T07:07:55.459+00:00][INFO ][plugins.security.routes] Logging in with provider "basic1" (basic) {"service":{"node":{"roles":["background_tasks","ui"]}}}
proc [kibana] [2023-12-26T07:07:55.495+00:00][WARN ][plugins.security.session.index] Attempted to create a new session, but session index or alias was missing. {"service":{"node":{"roles":["background_tasks","ui"]}}}
info [o.e.c.m.MetadataCreateIndexService] [ftr] [.kibana_security_session_1] creating index, cause [api], templates [], shards [1]/[0]
The Attempted to create a new session, but session index or alias was missing
log line is expected since we delete session index before every test. In such case Kibana re-creates index and then tries to write session document again. But looking at the next log lines:
debg Waiting for all sessions to be persisted...
debg Existing sessions: {"total":{"value":5,"relation":"eq"},"max_score":1,"hits":[{"_index":".kibana_security_session_1","_id":"Xga8j6T1jZIT/vuNU+vVb7fKrF4mHvHExCAbmjsciQI=","_score":1,"_source":{"provider":{"type":"saml","name":"saml1"},"idleTimeoutExpiration":1703578076289,"lifespanExpiration":1706166476289,"createdAt":1703574476289,"usernameHash":"b634643fc458a0aeacc625c15bc11e79dcad5ac011b2a1ba4f905e555acefb23","content":"xw8kZQVIhc4oOqvLUrXmT7q1c/Bz2xxRGnMRVIbZsqkDB8Cz1Tck2wBmjKONzNj3K+VRq2eWOMloar6dZIaK6na7op0Ohtv5dmXWP7uo2ouxpZ7jmGFXkD+yQiteYVttmge3rPReW2KWbubMa0SsfYuU5xD10Ku54/exxLaNC1IoliNrs3z+vfUAs07uxmWF7hVG+B6N3Hg1J7oz8+Xd83UyfVF+wgfeWm/MaE4DMm/Ptyemih+SLxWVWvwLXMwt1uS7zjrBNMNBBImxlVSnqW/1TIp9tXop1VcDJJ1DQRyy3MH0vsGUJBD+zYuxrrsJ5Zu5Hgpv1UlyRNhX5nfSzOa5VofPSBoyIAsYwOOLfQCEh4hy5O6IihPyUOfmTDAdvOpQ9Agxak55irtBRoCkwDO+QdKl1YM3lcPvOc2U/pmhspxVQmsZgqleN7DggO9APvftuM40qspenI1/80BE"}},{"_index":".kibana_security_session_1","_id":"aKFUIugOf9EvHKMRvCMrhb73RiqtTbvL50jpurho87U=","_score":1,"_source":{"provider":{"type":"basic","name":"basic1"},"idleTimeoutExpiration":1703578076826,"lifespanExpiration":1706166476826,"createdAt":1703574476826,"usernameHash":"e9ab99ee1daa1aa2b5cac38d446ac31f555b0a0bd0cd7a7335a3a2a065635e64","content":"hxBMUaCERA7MbiMqRUV4UPxwDxAHn1vCch9RoakjEtsJvv0OECfauZvWYVx4ROQY3nhGuOkZAVKm8wiCOen8zEGIDMVCtDG/4tNxI5Cr8Yim6Ep8w9GupCUSIjKml7l5lCQxVSM6rFu13q0MRk0fj5ZlQf+zVl4SL8icEjnD6zsvqi7JfQihME2O6FGBMtV67Whls8RRGC3vJiUKyVDgnEmTTbQ1bLywu/Ekw1xVwoqMc2oZO/atJpheIzIKouU/DZ7ppTO2Aca1pq99yEbf7c7Xe3XuUlc2IY6Kr3l9/1dhCTQBntyRIRvUlN/e8/MSWhgRudNy7AAEMwQDX2NR3+Q="}},{"_index":".kibana_security_session_1","_id":"1BbgcAGNoKBcQk7ac1F+9Wrm+q+zZ7RF6VI0TVQ4GJ8=","_score":1,"_source":{"provider":{"type":"saml","name":"saml1"},"idleTimeoutExpiration":1703578076930,"lifespanExpiration":1706166476930,"createdAt":1703574476930,"usernameHash":"b634643fc458a0aeacc625c15bc11e79dcad5ac011b2a1ba4f905e555acefb23","content":"tBZzVRMYYDX4V3p6yjEMURMwi1i36iF313e5RiiSLRJjLujjC/hv9wIpfsFwbAhyBp6cOlMorfrFJu/H1vbZ/tyYO77MbYqNFX3AHeIV+Vo52RJNtb4QoTsmhV7o6gG/uqMBu+piwg+Uz19Mw85lY3ddO1T4G7NrT/rw8HmzYMhm5VIYK+QhBn/c1fasnHAAzlbyUWq7hgWC0XPOk/GtGpKcdAEq12Y1017dXqStu/1+a7XiYlzPhPGEwn7IFhuzrX3CAK0KcZZb+faUomFTAQ84y3MTtyRwGgtIlr/d08gtzXMmyIH0Kl12yML41wJTv46eFzecpzyEuMeoqEPW5M3/tgDDpb4gnF8vULpzc+AA53wBUL1W1GBr60k/WrxXDHajsu1kMFwsXcaWQIAHo/kUkkI/4RpJDBYB0HU2XpOT68brbMVd2tcdkN/gmXGicsrk/3Rjr6T2TGaPSEBJ"}},{"_index":".kibana_security_session_1","_id":"kZHYAH2xZ+v2Z4r2U/rkmGN1gOOatPqsldmfnIvZhU0=","_score":1,"_source":{"provider":{"type":"basic","name":"basic1"},"idleTimeoutExpiration":1703578077465,"lifespanExpiration":1706166477465,"createdAt":1703574477465,"usernameHash":"e9ab99ee1daa1aa2b5cac38d446ac31f555b0a0bd0cd7a7335a3a2a065635e64","content":"N+WKrs6wtW6lti5/1TFJ14Mlda4swfNXAjPIq5+XANrnyphl0GavhnPI0olqmcW3+gskYA3DEHhPhju9kOX4Bgryky0DRIk4nfWSNhvc64vf78oWtutrrvIPp8xTAs3u+QNiok5upPPoPOqoUZ+FMYRSbpGYZMsGTo9Z/QSmPD8fIQnXpfy7Gnk0S1N+/yS/saaYozzsZeNj+RAjolS+XR6sFSzaI6JgsA+dTi095Iw+FzNRbfOl9Xdllh+P3fB9q0vZT0rq1Ky9uZYsQMriLVD8dLra/M5D9xjNGUqjtNE5wM4D+24sWkwF3sXakfPJXneaBOeXStUJ80ay/o++L4M="}},{"_index":".kibana_security_session_1","_id":"b8DspPMDuxmrqDB+yWpkFlsvu6pnA05b28jdK22dL7k=","_score":1,"_source":{"provider":{"type":"saml","name":"saml1"},"idleTimeoutExpiration":1703578077632,"lifespanExpiration":1706166477632,"createdAt":1703574477632,"usernameHash":"b634643fc458a0aeacc625c15bc11e79dcad5ac011b2a1ba4f905e555acefb23","content":"M8WgaUJpvc7fGzXo/v7MfwdG2foK3BBLd/9RMkBcLlAcEAXkLPMg73/LkYdFLjMng8TfsmGOC8BAd8tlRT9/QJfWscE6JpG3Bl/3WUrIsPA5TwrkJVxqvCSOt5WWJL+QBHZX9oaR8rjGbiV8O8rWE9R3cMqNoBGYOFGC++vYlh+Muzfdh4dj/wVE+VBQNyJ5q2ZbccdrC4BD00fan+rZ665aU4DD85CDw1quzLZCMWnJqoYSNL6bJfLEaiI1MqV6ouCAhuCGMpSc3tUqFXLP4BMcBibaqoEih8RajGFZqrIhIBhF0GbOwhCpsEF9Vdg5XO0IzvkuBLNIp506pnsmu7bQFszxEArsqZyPNdodFlMsfMNhVwNeAcBxWJX76wlllOM5BfcVrVam5cuo8tbA8+yCZckU+PNJ5sXNZiqZQ9TVWYffJKiM9zu99TA3c4eSzVqpNOAqdSuXfwYFvI2p"}}]}.
debg --- retry.tryForTime error: expected 5 to equal 6
we can see that the oldest session (basic
) is still missing meaning that our second attempt to write the session right after we created index didn't work for some reason:
{
"total": { "value": 5 },
"hits": [
{
"_index": ".kibana_security_session_1",
"_source": {
"provider": { "type": "saml", "name": "saml1" },
"createdAt": 1703574476289,
}
},
{
"_index": ".kibana_security_session_1",
"_source": {
"provider": { "type": "basic", "name": "basic1" },
"createdAt": 1703574476826,
}
},
{
"_index": ".kibana_security_session_1",
"_source": {
"provider": { "type": "saml", "name": "saml1" },
"createdAt": 1703574476930,
}
},
{
"_index": ".kibana_security_session_1",
"_source": {
"provider": { "type": "basic", "name": "basic1" },
"createdAt": 1703574477465,
}
},
{
"_index": ".kibana_security_session_1",
"_source": {
"provider": { "type": "saml", "name": "saml1" },
"createdAt": 1703574477632,
}
}
]
}
This is the code that does this https://github.com/elastic/kibana/blob/392d53c9ba6b47026a1c66c941cd31ba9c6a7426/x-pack/plugins/security/server/session_management/session_index.ts#L301-L309. For the second attempt we use ignore404: false
and the logs don't show that this request failed, and from the Kibana perspective everything seems to be functioning as expected.
I'm going to add more log lines around this code to make sure we get statusCode === 200
when we try to write the session document for the second time right after we created index.
So, I've run these tests (with additional logging) via the Flaky Test Runner 590 times without any failures (x100 + x490). It doesn't seem like this failure is widespread enough to warrant adding more logging to the codebase, but if it continues, we'll address it. Closing the issue for now as not actionable.
New failure: CI Build - 8.12
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
Closing to see what difference https://github.com/elastic/kibana/pull/175352 makes.
New failure: CI Build - 8.13
New failure: CI Build - 8.12
New failure: CI Build - main
New failure: CI Build - 8.13
New failure: CI Build - main
New failure: CI Build - main
New failure: CI Build - main
New failure: kibana-on-merge - 8.13
New failure: kibana-on-merge - 8.13
New failure: kibana-elasticsearch-snapshot-verify - 8.13
New failure: kibana-on-merge - main
latest failure:
└-> should properly clean up sessions that exceeded concurrent session limit even for multiple providers
--
| └-> "before each" hook: global before each for "should properly clean up sessions that exceeded concurrent session limit even for multiple providers"
| └-> "before each" hook for "should properly clean up sessions that exceeded concurrent session limit even for multiple providers"
| └- ✖ fail: security APIs - Session Concurrent Limit Session Concurrent Limit cleanup should properly clean up sessions that exceeded concurrent session limit even for multiple providers
| │ Error: retry.tryForTime reached timeout 20000 ms
| │ Error: expected 5 to equal 6
| │ at Assertion.assert (expect.js:100:11)
| │ at Assertion.apply (expect.js:227:8)
| │ at Assertion.be (expect.js:69:22)
| │ at cleanup.ts:235:56
| │ at processTicksAndRejections (node:internal/process/task_queues:95:5)
| │ at runAttempt (retry_for_success.ts:29:15)
| │ at retryForSuccess (retry_for_success.ts:98:21)
| │ at RetryService.tryForTime (retry.ts:37:12)
| │ at Context.<anonymous> (cleanup.ts:234:7)
| │ at Object.apply (wrap_function.js:73:16)
| │ at onFailure (retry_for_success.ts:17:9)
| │ at retryForSuccess (retry_for_success.ts:84:7)
| │ at RetryService.tryForTime (retry.ts:37:12)
| │ at Context.<anonymous> (cleanup.ts:234:7)
| │ at Object.apply (wrap_function.js:73:16)
New failure: kibana-on-merge - main
New failure: kibana-on-merge - main
Came up on interesting thing, the cleanupInterval
is set to 5h
in FTR config:
--xpack.security.session.cleanupInterval=5h
The log shows us the we invoke the cleanup task in the very first test and get 500.
[00:00:03] │ debg Existing sessions: {"total":{"value":3,"relation":"eq"},"max_score":1,"hits":[{"_index":".kibana_security_session_1","_id":"gyCzdu4JDUUi03Cd2G+daZuJQKTEjvCidLFgsm6sgA4=","_score":1,"_source":{"provider":{"type":"basic","name":"basic1"},"idleTimeoutExpiration":1720528171722,"lifespanExpiration":1723116571722,"createdAt":1720524571722,"usernameHash":"e9ab99ee1daa1aa2b5cac38d446ac31f555b0a0bd0cd7a7335a3a2a065635e64","content":"M1k1w9v840N6u1EZkfwVVjzwLk9udSESi9jwRJ1znBPz0EO4RAliKE53gVyL99Zr1HpUAIIstiRtTbPIF0cVjoHmp/Z+kHeGxNAB2FbFecBjo7zPpxL6VnOr/7E3qNevj/8/sBObC4rW7l3kkDFV7AgSqWY9S64JazbqmpnH8mG18wE36L/hH4P0gTBmYCOTK35u83ajRoGFrr1Mbtkm8GEowxCKuRgkOAie2CO6btZ0KUGVHv6CiF4P9WKZnGIf2fyOS7YG7rFx98oWAs4OZhRm+/1CDU+4Q5x4L8e4Z3/fvSiah+15+1rHUzG3PNE9RQBtaGfFpxr7mo8KNCDTzvg="}},{"_index":".kibana_security_session_1","_id":"ED/Teu0G+MXg5zFFw63f4IGhJYNDQiLTku9RFKU73ZQ=","_score":1,"_source":{"provider":{"type":"basic","name":"basic1"},"idleTimeoutExpiration":1720528172475,"lifespanExpiration":1723116572475,"createdAt":1720524572475,"usernameHash":"e9ab99ee1daa1aa2b5cac38d446ac31f555b0a0bd0cd7a7335a3a2a065635e64","content":"SVDq4yk0ZZV76/00timLNRKDOnVQrZLYlsPxBX+gbHR8F/36smopXI+tS4MXgqQL4fAmVmrScmA69r/O4y5B3bHNPY1yqTYx9Zomel6bfZ3dsjQScHCrwKOLVUob4+Hn92FvT289OUMncks1OwQg5S5bET06Nd4jYhd7C5kKUzsH+vHDJV7mgMmBqRuf8m1rNUUge3XO4ra19Uil8Ou6qDGJXJRPbYOZWHoq2ey+Nr6h5j8zUPfr/wo5PiTSWdNw1PJG/MXOQiBHRi4hbVTw2ZIFIlxi1ARh+/D2aV5KbziAmxeCHu9o/1g6YJrUAxGNI9Qtd9Po5SY53sGmZ4mBF6U="}},{"_index":".kibana_security_session_1","_id":"/zR3wzCyms2bRib56OUNOn6WSHMnKbM2VjNtyBZrpJw=","_score":1,"_source":{"provider":{"type":"basic","name":"basic1"},"idleTimeoutExpiration":1720528173016,"lifespanExpiration":1723116573016,"createdAt":1720524573016,"usernameHash":"e9ab99ee1daa1aa2b5cac38d446ac31f555b0a0bd0cd7a7335a3a2a065635e64","content":"txDKnicnMjQu/ZP42i+W9BT5sM81jTL8VeHXYLTYfyWiQV7fPB98IJKQeWHPfeEjweGwjvHOfU/zooZANLYv81IeY5X4JlOL3O51aPTdf6a5iaSwJTi2DN3ASiOhi7OcLqabMs8csbAbzP6w9aN6O1PmMekPmdiLt/aQlTNlLWJPMuJ4drdTU6FJ+MFc4aOa4X1MfFX/brBv+AQ/fcr5Fq4oB/7EJPSoj3b8hjUeYaZ1DTu6sB381HfpXkg3OfihRdioUnTGjwaSD20HwT9c+roI9GZhmqewijXDoW+xwLyfv5JYtkxzP+zlktdPbGBZI2NUmxChUUsrMNDAbyEQ7aM="}}]}.
[00:00:03] │ proc [kibana] [2024-07-09T11:29:33.098+00:00][ERROR][http] 500 Server Error {"http":{"response":{"status_code":500},"request":{"method":"post","path":"/session/_run_cleanup"}},"error":{"message":"Failed to run task \"session_cleanup\" as it is currently running"},"service":{"node":{"roles":["background_tasks","ui"]}}}
[00:00:03] │ debg --- retry.tryForTime error: expected 200 "OK", got 500 "Internal Server Error"
That means the cleanup job itself was already running, which shouldn't be the case, because we set interval to 5h
before and it is the very first test in out test suite and the first time we invoke session/_run_cleanup
.
So if the cleanup job is running on some different interval it might corrupt sessions from the following tests even before we invoke session/_run_cleanup
which leads to flaky behaviour sometimes. (2 test suites running on the same node and overriding interval?)
cc @azasypkin Probably you can shed some light on it, I don't have that much context around FTR setup in general.
New failure: kibana-elasticsearch-snapshot-verify - 8.15
New failure: kibana-elasticsearch-snapshot-verify - 8.15
Same reason as described in https://github.com/elastic/kibana/issues/149091#issuecomment-2283473524
A test failed on a tracked branch
First failure: CI Build - main