Open dgieselaar opened 10 months ago
Pinging @elastic/kibana-core (Team:Core)
Pinging @elastic/kibana-security (Team:Security)
Adding the security team as I think we want them in the loop regarding the security aspect of performing server-to-self requests
Just some unstructured thoughts here
Functionally, for the usage described, I think the feature request makes sense.
Still, it kinda breaks the whole concept of per-contract plugin communication, and, for the same reason as for the "runtime dependency resolving", we would need to be careful on how/why teams are using this (especially if the performance cost is way higher than direct contract communication - and it likely will).
It looks like this is something close that what we could want for a more service-based architecture (performing service to service http requests would be more complex, given we can't just blindly hit localhost, but the API/interface could be fairly similar).
When hitting localhost, ideally we would avoid forging a full ("real") request and perform the full round trip, and inject the request directly into the HAPI server instead. Note that HAPI supports request injection, but it's usually more for testing. We would need to check if that can really be used for production, and if not if there are alternative to inject the request without a full http roundtrip.
In k8s environments, we could potentially hit the deployment's service instead of localhost, to leverage the innate load balancing.
For authentication, atm I don't think we would be able to do better than reusing the authentication headers from the inbound request (or from a "fake" request as already done for some other background task work such as reporting)
If we can't avoid doing a full http roundtrip even for localhost, is there concerns / problems doing so regarding TLS (I know that @legrego expressed concerns about this in a prior discussion but I'm not sure where it ended)
If we can't avoid doing a full http roundtrip even for localhost, is there concerns / problems doing so regarding TLS (I know that @legrego expressed concerns about this in a prior discussion but I'm not sure where it ended)
Having the Kibana server make a full-blown request back to itself will not work for installations configured to use the PKI authentication provider. More generally, it will fail anytime an instance is configured with server.ssl.clientAuthentication: required
, as the Kibana server will not be presenting an appropriate client certificate on its outbound request back to itself.
There is also an issue of session management. The security plugin takes care of transparently refreshing the user session whenever that's required, and that sometimes happens in the middle of an active http request. In this scenario, the server-initiated request may be the one that triggers a session refresh. If this happens, the real (browser) client and the server will have a different representation of the session, which will cause the real (browser) session to be invalid, kicking the user back to the login screen.
There are also environmental considerations. Networks may be configured in such a way that prevents the Kibana server from initiating a connection back to itself.
@legrego thanks for that list.
Having the Kibana server make a full-blown request back to itself will not work for installations configured to use the PKI authentication provider. More generally, it will fail anytime an instance is configured with server.ssl.clientAuthentication: required
Yeah, this was the major issue I had in mind. And I'm not sure injecting requests instead of performing full-blown requests would work around it, given I assume we would still need to fully run the authc workflow for such requests, right @legrego? Or could we virtually have a way to execute http requests "on behalf" of our users without being forced to use the same authc provider (like, something similar to the scoped ES client we're using for impersonation) ?
This is currently implemented in a hacky way, which is executing a request server-side that takes some of the request properties.
@dgieselaar Can you point us to the code? It sounds like the scenario @legrego described could already be impacting the hacky implementation.
@rudolf https://github.com/elastic/kibana/blob/main/x-pack/plugins/observability_ai_assistant/server/functions/kibana.ts. I mentioned this on Slack earlier, but for the current workaround, it's totally OK if this fails once in a while - it's also OK if this fails in very specific setups. We're in LLM-land where we have to deal with unexpected errors anyway (for now, hopefully this changes in the future).
@legrego thanks for that list.
Having the Kibana server make a full-blown request back to itself will not work for installations configured to use the PKI authentication provider. More generally, it will fail anytime an instance is configured with server.ssl.clientAuthentication: required
Yeah, this was the major issue I had in mind. And I'm not sure injecting requests instead of performing full-blown requests would work around it, given I assume we would still need to fully run the authc workflow for such requests, right @legrego? Or could we virtually have a way to execute http requests "on behalf" of our users without being forced to use the same authc provider (like, something similar to the scoped ES client we're using for impersonation) ?
I think it depends on where/how the request is injected. If it's going through Hapi's request injection feature that you mentioned, then I expect it would fail for the same reason. Other injection techniques might be more successful, but that's beyond my expertise (@azasypkin?).
Sorry if it has been discussed already, but have we considered creating a short-lived API key on behalf of the user under the hood to interact with Kibana and Elasticsearch APIs while the user is interacting with the assistant? The key can be created lazily and disposed of automatically after some idle timeout or once it's expired (as a cleanup measure and because user privileges might change).
It has its own constraints, of course, but at least we won't need to worry about all possible authentication mechanisms and session management.
More generally, it will fail anytime an instance is configured with server.ssl.clientAuthentication: required
Yeah this can be problematic, and depends on the approach we pick. There is a high chance that "Hapi injection" bypasses this restriction, though. We need to double check.
Sorry if it has been discussed already, but have we considered creating a short-lived API key on behalf of the user
cc @dgieselaar
@pgayvallet Maybe I'm misunderstanding, but that is solving a different issue than what is being discussed here no?
@pgayvallet Maybe I'm misunderstanding, but that is solving a different issue than what is being discussed here no?
Well, I see two distinct yet closely connected issues here, both revolving around calling Kibana APIs from the route handler:
It seems we have a couple of options for the former one, while the question/suggestion about API keys is aimed at addressing the latter.
Describe the feature:
As a Kibana developer, I want to call other Kibana endpoints from my route handler, without having to worry about TLS, cookies, redirects, base paths, spaces and whatnot.
Describe a specific use case for the feature:
The Observability AI Assistant allows users to use natural language to interact with the Elastic Platform. For instance, users can ask the Assistant to create an index for them, or what the cluster status is. The Assistant will then send what we call a function request (a function name and a payload) back to the platform, which is then executed on behalf of the user. In this case, the assistant would ask for the
elasticsearch
function to be executed, with the payload{ "method: "GET", pathname: "_cluster/health" }
.Similarly, we also have a
kibana
function, which interacts with Kibana APIs. This for instance allows the user to create an SLO using natural language. This is currently implemented in a hacky way, which is executing a request server-side that takes some of the request properties. For various reasons (security, performance, reliability), this is not something we want to support long-term, so we would need an alternative solution.