awslabs / mountpoint-s3

A simple, high-throughput file client for mounting an Amazon S3 bucket as a local file system.
Apache License 2.0
4.45k stars 151 forks source link

Debug/trace logging that includes the HTTP request headers #878

Open thundergolfer opened 4 months ago

thundergolfer commented 4 months ago

Tell us more about this new feature.

I've encountered a HTTP 501 Not Implemented error using Cloudflare R2 with mount-s3.

2024-05-14T17:30:02.877651Z DEBUG 
open{req=20 ino=3 pid=327824 name="test-marker"}:put_object{id=8 bucket="modal-bandwidth-usage-testing" key="test-marker"}: mountpoint_s3_client::s3_crt_client: 
S3 request failed request_type=CreateMultipartUpload crt_error=Some(Error(14343, "aws-c-s3: 
AWS_ERROR_S3_INVALID_RESPONSE_STATUS, Invalid response status from request")) http_status=501 range=None duration=15.819426ms ttfb=Some(15.499815ms) request_id=<unknown>

Cloudflare has docs on CreateMultipartUpload showing what features it does and does not implement: https://developers.cloudflare.com/r2/api/s3/api/#unimplemented-object-level-operations.

I'm looking for a way to find which of those unimplemented features is being used by mount-s3, causing the HTTP 501.

I've enabled both --debug and --debug-crt but that doesn't provide enough request detail.

dannycjones commented 4 months ago

Hey Jonathon, do you have logs that contain the headers struct? Something like... error_response_headers: Some(Headers { inner: 0x7f6170013c80 })? I'm guessing not, I didn't see any references to headers when creating requests in a quick scan of the code base.

If you did have those and you're building from source, I used this patch before which may be useful.

From 638bc5612671b32c9051ad17e577dc2b6b95aabf Mon Sep 17 00:00:00 2001
From: Daniel Carl Jones <djonesoa@amazon.com>
Date: Mon, 5 Jun 2023 17:38:02 +0100
Subject: [PATCH] Implement debug for Header as list of header tuples

Signed-off-by: Daniel Carl Jones <djonesoa@amazon.com>
---
 mountpoint-s3-crt/src/http/request_response.rs | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/mountpoint-s3-crt/src/http/request_response.rs b/mountpoint-s3-crt/src/http/request_response.rs
index 5f20d1e1..3162275c 100644
--- a/mountpoint-s3-crt/src/http/request_response.rs
+++ b/mountpoint-s3-crt/src/http/request_response.rs
@@ -50,7 +50,6 @@ impl<N: AsRef<OsStr>, V: AsRef<OsStr>> Header<N, V> {
 }

 /// A block of HTTP headers that provides a nice API for getting/setting header names and values
-#[derive(Debug)]
 pub struct Headers {
     inner: NonNull<aws_http_headers>,
 }
@@ -241,6 +240,14 @@ impl<'a> Iterator for HeadersIterator<'a> {
     }
 }

+impl std::fmt::Debug for Headers {
+    fn fmt(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
+        write!(fmt, "Headers(")?;
+        fmt.debug_list().entries(self.iter()).finish()?;
+        write!(fmt, ")")
+    }
+}
+
 /// A single HTTP message, initialized to be empty (i.e., no headers, no body).
 #[derive(Debug)]
 pub struct Message<'a> {
-- 
2.45.1

I don't think there's any reason not to merge that in, so I can chat with the team - I just want to ensure there's no risk of leaking anything we don't want to put in logs. The other half of this would be making sure that we actually log the struct where its useful, such as when constructing requests.

dannycjones commented 4 months ago

The other half of this would be making sure that we actually log the struct where its useful, such as when constructing requests.

This may be quite a bit more challenging than it sounds - in the case you shared (PutObject), this is usually mapped to a multi-part upload with many parts which we don't directly control - the AWS CRT handles this (to optimize upload performance).

dannycjones commented 3 months ago

I was digging into this a bit more. Unfortunately, we don't log the request headers struct already - in fact, we don't see the final request headers struct, so improving the formatting of the struct won't help in this case. I was hoping the request signing logs may include them, but it only includes the headers that will be signed themselves.

To resolve this, we'd need to make CRT changes. One possible approach could be to update the telemetry callbacks we have with the CRT which currently surfaces metrics and response headers, and include request headers there. We would then be free to format them should we wish to.