[SIEM integration] Investigate if agents can capture authentication related information

SergeyKleyman commented 5 years ago

Description of the issue

We started to investigate possible ways to use APM collected information for SIEM purposes. One of proposals is to have APM capture authentication attempts and use captured data for SIEM analysis. See https://github.com/elastic/apm/issues/128 for more details. For the first milestone we would like to limit the scope to Form-based authentication - more advanced authentication protocols such as OpenID Connect or SAML are not part of the scope at this stage.

In this issue we would like to summarize which authentication related information APM agents can capture and what would be the effort to implement the missing pieces. The data we would like agents to capture is:

Detect transactions that are authentication attempts - that is the transactions that were invoked for authentication purposes. For example transaction for POST request to /login with form data containing user's credentials should be marked as "transaction for authentication purposes" while subsequent transactions (that probably check a cookie to ensure that user is already authenticated) should not be marked as "transaction for authentication purposes"
Capture authentication attempts outcome. Possible values are success,failure and unknown . unknown can be used for cases when authentication did not succeed but not because credentials were checked and found invalid but instead check procedure itself did not complete for some technical reason. For example some backend service on which application relies for authentication was not available.
Capture username supplied as part of user's credentials for authentication attempt. Please note that we would like to capture username even for authentication attempts with outcome other than success. It might be even more interesting from SIEM point of view to see username used for authentication attempts that did not succeed.

What we ask agent teams to do

@elastic/apm-agent-devs Please leave a comment answering the question: can your agent capture authentication related information described above, namely: 1) Detect authentication attempts, 2) Authentication outcome and 3) Username and what would be approximate effort estimate to implement it?

In addition if you have any other suggestion or comment regarding the subject of SIEM-APM integration around authentication related information (for example maybe we should not use transaction events to transfer authentication related information from agents to APM Server - maybe we should use span events or introduce a new type of event, etc.) please leave a comment in this issue.

Agents status summary

Agent	Detect authentication attempts	Outcome	Username	Notes
.NET
Go
Java
Node.js
Python
Ruby

beniwohli commented 5 years ago

For Python, we'd probably have to approach this in a framework-by-framework basis, e.g. separate handling for Django, Flask, etc. Even then, it might be somewhat complicated, as e.g. in Django you are not required to use the built-in Login form handlers (e.g. the Admin app that comes with Django and would be of high interest in a SIEM context, uses custom login form handling). In fact, we'd probably try to hook in a few levels deeper into the framework than the HTTP layer. This would probably take a 2-3 days work per framework, and as far as I can tell, gather all the required information.

An alternative implementation could be based on letting the user configure patterns for login URLs, username fields etc., which could then be applied somewhat framework-independent. But this would be quite brittle and rely on a lot of configuration from the user, which doesn't make for a great out-of-the-box experience.

SergeyKleyman commented 5 years ago

@beniwohli Thank you for the feedback.

I'm investigating possible approaches to implement this for .NET agent and I was wondering if we can find a solution that doesn't depend on the framework. For example since we limited the scope to Form-based authentication and we know that as result of authentication web front sets well known cookie with session ID or JWT, etc. Maybe we can use this information to build framework independent solution? For example if transaction's response contains Set-Cookie for one of known session ID cookies and transaction's request was HTTP POST with Content-Type: application/x-www-form-urlencoded can we deduce that this transaction is an authentication attempt? Can we extract username from the request body? Outcome is tricky since application can return 200 and redirect to "Access Denied" page... And it all starts too look like a hack... Do you think there is any way to make implementation work without specific piece for each framework?

beniwohli commented 5 years ago

@SergeyKleyman unfortunately, that approach wouldn't work in e.g. Django. The default session backend always sets a session cookie when one isn't sent by the browser, even for unauthenticated users (this allows to e.g. have a shopping cart on a webshop without having to be authenticated). Once the user authenticates themselves, the default authentication backend stores that information in the session, but not in a way that could be detected by looking at the cookie.

I'm saying "default session/authentication backend", because all of this is can be switched out with custom backends that may work differently. That's why I think we can only detect authentication events on a lower abstraction layer that is shared amongst backends.

This gets even more fun with frameworks like Flask that don't have default implementations for sessions and authentication...

axw commented 5 years ago

An alternative implementation could be based on letting the user configure patterns for login URLs, username fields etc., which could then be applied somewhat framework-independent. But this would be quite brittle and rely on a lot of configuration from the user, which doesn't make for a great out-of-the-box experience.

This is my feeling too. I expect detecting a login attempt isn't too complicated - probably you could just look at the URL path in most cases. Extracting the username could get messy quickly though: could be a form field, query param, header, JSON in the body, etc. And as you say, there's no guarantee that HTTP status codes are used sensibly, so determining the outcome requires more config.

I feel like it would be better to focus on providing integration with common frameworks/libraries, and utilities for dealing with custom implementations.

elastic / apm