elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.71k stars 8.12k forks source link

Failing test: X-Pack Security API Integration Tests (Session Idle Timeout).x-pack/test/security_api_integration/tests/session_idle/cleanup·ts - security APIs - Session Idle Session Idle cleanup should properly clean up session expired because of idle timeout #121482

Open kibanamachine opened 2 years ago

kibanamachine commented 2 years ago

A test failed on a tracked branch

Error: expected 200 "OK", got 401 "Unauthorized"
    at Test._assertStatus (/opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/lib/test.js:268:12)
    at Test._assertFunction (/opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/lib/test.js:283:11)
    at Test.assert (/opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/lib/test.js:173:18)
    at assert (/opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/lib/test.js:131:12)
    at /opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/lib/test.js:128:5
    at Test.Request.callback (/opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/node_modules/superagent/lib/node/index.js:718:3)
    at /opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/node_modules/superagent/lib/node/index.js:906:18
    at IncomingMessage.<anonymous> (/opt/local-ssd/buildkite/builds/kb-n2-4-c3b96ff0bbb5ca35/elastic/kibana-hourly/kibana/node_modules/supertest/node_modules/superagent/lib/node/parsers/json.js:19:7)
    at IncomingMessage.emit (node:events:402:35)
    at endReadableNT (node:internal/streams/readable:1343:12)

First failure: CI Build - main

elasticmachine commented 2 years ago

Pinging @elastic/kibana-security (Team:Security)

azasypkin commented 2 years ago

Error: expected 200 "OK", got 401 "Unauthorized"

I cannot think of any reason why we'd get error like that in this test. Here is the assertion that fails: https://github.com/elastic/kibana/blob/788db0dd9d662b75de9dfe581852323f6d645c77/x-pack/test/security_api_integration/tests/session_idle/cleanup.ts#L86-L95

It's a simple basic login API request with the credentials of the pre-populated admin test user. I'll keep eye on this issue and either dig deeper if it happens again or close it as an intermittent if we don't see failures during next few weeks.

kibanamachine commented 2 years ago

New failure: CI Build - 8.0

azasypkin commented 2 years ago

Investigating the failure...

azasypkin commented 2 years ago

In this test we take the following steps:

  1. Log in via '/internal/security/login'
  2. Verify that session cookie is valid via '/internal/security/me'
  3. Wait until session expires (idle timeout is 5s) and the cleanup job removes it
  4. Verify that session is no longer valid via '/internal/security/me'

The test fails on the step 2 likely because sometimes Kibana is too slow on CI and it takes more than 5s between steps 1 and 2.

I'm doubling timeouts in configuration for these tests and will see if it helps: https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/37

kibanamachine commented 2 years ago

New failure: CI Build - 8.0

kibanamachine commented 2 years ago

New failure: CI Build - 8.0

azasypkin commented 2 years ago

New failure: CI Build - 8.0

Closing again since 8.0/7.x backports haven't been merged yet due to merge conflicts (CI run pending).

kibanamachine commented 2 years ago

New failure: CI Build - main

azasypkin commented 2 years ago

New failure: CI Build - main

Before the fix, the symptom in all failed builds was Error: expected 200 "OK", got 401 "Unauthorized", now it's different: Error: expected 1 to equal 0. Let's wait and see if it's a transient issue or not. If not, we'll need to make sure that's not related to the recent changes in the cleanup logic in https://github.com/elastic/kibana/pull/122419

kibanamachine commented 2 years ago

New failure: CI Build - main

azasypkin commented 2 years ago

сс @thomheymann, just in case you can notice what can be the reason here.

kibanamachine commented 2 years ago

New failure: CI Build - 8.1

jportner commented 2 years ago

New failure: CI Build - 8.1

This is also the "new error":

└-: security APIs - Session Idle
--
  | └-> "before all" hook: beforeTestSuite.trigger in "security APIs - Session Idle"
  | └-: Session Idle cleanup
  | └-> "before all" hook: beforeTestSuite.trigger for "should properly clean up session expired because of idle timeout"
  | └-> should properly clean up session expired because of idle timeout
  | └-> "before each" hook: global before each for "should properly clean up session expired because of idle timeout"
  | └-> "before each" hook for "should properly clean up session expired because of idle timeout"
  | └- ✖ fail: security APIs - Session Idle Session Idle cleanup should properly clean up session expired because of idle timeout
  | │      Error: expected 1 to equal 0
  | │       at Assertion.assert (/opt/local-ssd/buildkite/builds/kb-n2-4-a6f049047d437185/elastic/kibana-hourly/kibana/node_modules/@kbn/expect/expect.js:100:11)
  | │       at Assertion.be.Assertion.equal (/opt/local-ssd/buildkite/builds/kb-n2-4-a6f049047d437185/elastic/kibana-hourly/kibana/node_modules/@kbn/expect/expect.js:227:8)
  | │       at Assertion.be (/opt/local-ssd/buildkite/builds/kb-n2-4-a6f049047d437185/elastic/kibana-hourly/kibana/node_modules/@kbn/expect/expect.js:69:22)
  | │       at Context.<anonymous> (test/security_api_integration/tests/session_idle/cleanup.ts:111:54)
  | │       at runMicrotasks (<anonymous>)
  | │       at processTicksAndRejections (node:internal/process/task_queues:96:5)
  | │       at Object.apply (/opt/local-ssd/buildkite/builds/kb-n2-4-a6f049047d437185/elastic/kibana-hourly/kibana/node_modules/@kbn/test/target_node/functional_test_runner/lib/mocha/wrap_function.js:87:16)

The assertion that's failing is on line 111 below:

https://github.com/elastic/kibana/blob/660023b9404aa7de7f44f7c8af35804c0239128c/x-pack/test/security_api_integration/tests/session_idle/cleanup.ts#L100-L111

I think I know what's causing the test failure. We changed deleteByQuery to use a combination of find + bulk-delete so that we can log audit records for each deleted session document. In the delete step, we intentionally use refresh: false to ensure it's as fast as possible in case we have to delete multiple batches of sessions:

https://github.com/elastic/kibana/blob/6627bd8b3ab49415e397495d8e076f9861fcc894/x-pack/plugins/security/server/session_management/session_index.ts#L460-L467

However, we don't appear to be triggering an index refresh after the whole cleanup process is finished. I believe we intended to add that but it got lost in the mix.

kibanamachine commented 2 years ago

New failure: CI Build - main

kibanamachine commented 2 years ago

New failure: CI Build - main

mistic commented 2 years ago

Skipped.

main: 25d6e64

kibanamachine commented 2 years ago

New failure: CI Build - 8.1

azasypkin commented 2 years ago

Investigating the failure...

I don't like the fact that we run tests before Kibana is fully stable, most of the logs look like this:

info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding component template [synthetics-browser.screenshot@custom]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding component template [synthetics-tcp@custom]

NOTE: At this point the test is already running

debg Verifying session cookie for elastic.
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding component template [synthetics-http@custom]
debg Session cookie for elastic is valid.
debg Waiting for cleanup job to run...

NOTE: At the end of this "waiting" the test will fail, but componnents and templates are still being added

info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding component template [synthetics-icmp@custom]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding index template [synthetics-browser] for index patterns [synthetics-browser-*]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding index template [synthetics-browser.network] for index patterns [synthetics-browser.network-*]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding index template [synthetics-browser.screenshot] for index patterns [synthetics-browser.screenshot-*]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding index template [synthetics-tcp] for index patterns [synthetics-tcp-*]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding index template [synthetics-http] for index patterns [synthetics-http-*]
info [o.e.c.m.MetadataIndexTemplateService] [ftr] adding index template [synthetics-icmp] for index patterns [synthetics-icmp-*]

I'm pretty sure the tests ES instance is overloaded and Task Manager cannot timely deal with all the tasks. I'll see if I can find any way to decouple tests from this, but I'm concerned that we have to skip critically important tests because of that.

kibanamachine commented 1 year ago

New failure: CI Build - main

kibanamachine commented 1 year ago

New failure: CI Build - main

kibanamachine commented 1 year ago

New failure: CI Build - main

jbudz commented 1 year ago

/skip

kibanamachine commented 1 year ago

Skipped

main: 3dabc61726bbf9f554414904795c92b6ac9fc0a2

kibanamachine commented 1 year ago

New failure: CI Build - main

kibanamachine commented 1 year ago

New failure: CI Build - main

kibanamachine commented 1 year ago

New failure: CI Build - 8.8

kibanamachine commented 1 year ago

New failure: CI Build - 8.8

jeramysoucy commented 1 year ago

Ran another flaky test runner just to be sure, but this looks tied to a series of CI failures on Friday.

kibanamachine commented 1 month ago

New failure: kibana-on-merge - 7.17