elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.56k stars 24.62k forks source link

Add a QA test infrastructure for Security on by default #78462

Open BigPandaToo opened 2 years ago

BigPandaToo commented 2 years ago

For e2e integration testing of Security on by default we need a special infrastructure to support following scenarios:

mark-vieira commented 2 years ago

The first scenario I'm pretty sure can be pretty well covered by packaging tests. Correct me if I'm wrong @jkakavas @albertzaharovits but the packaging tests in PRs like https://github.com/elastic/elasticsearch/pull/77231 essentially verify everything listed above, at least for the single node scenario.

For the multi node scenario I think we can do this similarly to how we handle BWC tests. Essentially we just need to spin up a single node, grab the enrollment token, spin up a second node using that token and validate all is well..

We have no mechanism for multi-host testing of any kind. All of our CI testing runs on a single host. We don't do multi-host testing for any other multi-node scenarios so I'd question whether it's really necessary in this scenario either. I'm apt to say no, unless there is something unique around this scenario vs any other multi-node cluster tests we already run.

jkakavas commented 2 years ago

The first scenario I'm pretty sure can be pretty well covered by packaging tests.

Yes, this is correct. What we basically need as you identified is to do something about what is referred above as "Multi Node and Single Host installation". We don't need to care about all installation types, this is what packaging tests are for.

For the multi node scenario I think we can do this similarly to how we handle BWC tests. Essentially we just need to spin up a single node, grab the enrollment token, spin up a second node using that token and validate all is well..

Absolutely. The fact that the cluster formed is a test success in this case, and if we can use this cluster to run an ESRestTestCase against, then we can figure out meaningful tests to run against it. If not, cluster successful formation is more than enough. One caveat though is that we won't be printing an enrollment token for ES nodes on startup by default, as elasticsearch still binds to localhost for the transport layer by default and you can't - by default again, due to heap size - run multiple nodes on the same host. That means that we'd need to run a cli tool against that node ( bin/elasticsearch-create-enrollment-token -s node ) to get a token instead of capturing it from the first node startup output.

For what is worth, we will have Packaging tests for this too, I have a PR ready to open waiting for #77231 and #77718 to be merged.

We have no mechanism for multi-host testing of any kind. All of our CI testing runs on a single host. We don't do multi-host testing for any other multi-node scenarios so I'd question whether it's really necessary in this scenario either. I'm apt to say no, unless there is something unique around this scenario vs any other multi-node cluster tests we already run.

We don't need this. This use case can be reduced to the previous one that we will be testing.

mark-vieira commented 2 years ago

One caveat though is that we won't be printing an enrollment token for ES nodes on startup by default, as elasticsearch still binds to localhost for the transport layer by default and you can't - by default again, due to heap size - run multiple nodes on the same host. That means that we'd need to run a cli tool against that node ( bin/elasticsearch-create-enrollment-token -s node ) to get a token instead of capturing it from the first node startup output.

I'm not sure I understand. We run all sorts of multi-node tests. When we setup test clusters by default we set the node heap size to 512m. What's the issue with localhost binding as well? Again, with test clusters we use ephemeral ports.

jkakavas commented 2 years ago

I'm not sure I understand. We run all sorts of multi-node tests. When we setup test clusters by default we set the node heap size to 512m. What's the issue with localhost binding as well? Again, with test clusters we use ephemeral ports.

Apologies, I wasn't clear. This was not meant to be a comment on the test infrastructure, but a comment to elasticsearch's behavior with regards to the enrollment process. We have decided to not print an enrollment token for other nodes by default as :

The effect this would have in our tests is that there will be no node enrollment token to capture from the output of the first node starting, but we;d need to run the CLI tool after the node starts, to an enrollment token. Hope this is clearer now.

mark-vieira commented 2 years ago

The effect this would have in our tests is that there will be no node enrollment token to capture from the output of the first node starting, but we;d need to run the CLI tool after the node starts, to an enrollment token. Hope this is clearer now.

Yup, makes sense now. So the flow is just slightly different, in that getting the enrollment token is an explicit act by the user, not something we just dump the the log automatically, but otherwise we still need to fetch the token (however that's done) and then supply it to another node.

elasticsearchmachine commented 2 years ago

Pinging @elastic/es-security (Team:Security)