Closed sarayourfriend closed 2 months ago
@sarayourfriend To confirm my understanding of the desired solution, will transforming the response_headers_middleware
middleware into a response_logging_middleware
and replacing the added headers with the following logging do the job?
logger.info(
"application_response_verification_status",
application_id=application.id,
application_name=application.name,
application_verified=application.verified,
)
Or is there a better place for logging this info? I tried extending request log metadata as the django-structlog documentation suggest but I couldn't get the application's information in conf/settings/signals.py
. I thought it could be the placing of the middleware in the base config list but it doesn't seem to be that.
# File: api/conf/settings/signals.py
@receiver(signals.bind_extra_request_metadata)
def bind_application_id(request, logger, **kwargs):
if not (hasattr(request, "auth") and hasattr(request.auth, "application")):
return
# This code is never reached ❌
application = request.auth.application
structlog.contextvars.bind_contextvars(application_id=application.id)
Your overall understanding is correct :+1: I don't think we need to use the middleware, though.
I haven't looked into the implementation of that signal, but I'm going to guess it must happen before authentication occurs, if request.auth isn't available? But it's weird that the documentation says user_id
is included, which would seem to imply it happens after authentication. Maybe there's an issue with how we've ordered the middlewares?
If that approach with the signals isn't possible, then I think it would be fine to add logging somewhere around here, in the authentication class itself: https://github.com/WordPress/openverse/blob/main/api/conf/oauth2_extensions.py#L30-L32
I don't think we necessarily need to add these properties to every log line, which it looks like extending the request metadata would do. Instead, we can have a specific log event client_application_authentication
(or whatever makes sense) that logs these pieces of information. Then, if we need to identify the client app for a request, we can query for that event where request_id=<the request id of the other request/event we're investigating>
.
Problem
https://github.com/WordPress/openverse/blob/2cb8b5eb11f0abb39f79697ae2b8ddc065b7e0f7/api/api/middleware/response_headers_middleware.py#L22-L24
The code linked above adds the client application name and verification status to response headers. This is useful when we scan Nginx request logs. However, it also makes responses that would not otherwise be sensitive to the authorization header suddenly sensitive.
Thumbnails, for example, can be universally cached at the edge without issue. However, when an authorization header is present, the cache (correctly) will bypass and send the request upstream. That is only correct and necessary behaviour due to our inclusion of authorization sensitive materials in responses.
Description
If we remove these headers from responses (and instead log them to our structured API logging), we can use more aggressive caching for thumbnails that does not vary on the authorization header.