fsouza / fake-gcs-server

Google Cloud Storage emulator & testing library.
https://pkg.go.dev/github.com/fsouza/fake-gcs-server/fakestorage?tab=doc
BSD 2-Clause "Simplified" License
1.06k stars 209 forks source link

Content-Type being set incorrectly for resumable uploads. #1098

Open otherguy opened 1 year ago

otherguy commented 1 year ago

Hi!

First of all, thank you for developing this. It's an invaluable tool for local development :)

We are using the latest version (1.44.0) with Ruby/Rails where the Google SDK always does resumable uploads.

Generally, it's working well but the content type of all uploaded files is set to application/json. This can be verified by execing into the Docker container and running attr (might have to apk --update add it first) on the file.

# attr
A filename to operate on is required
Usage: attr [-LRSq] -s attrname [-V attrvalue] pathname  # set value
       attr [-LRSq] -g attrname pathname                 # get value
       attr [-LRSq] -r attrname pathname                 # remove attr
       attr [-LRq]  -l pathname                          # list attrs
      -s reads a value from stdin and -g writes a value to stdout
# attr -g metadata /storage/bucket/6d97b1f7-f3cb-49db-89a3-6039d01f24ba/b17d0638-275e-4864-b868-a13131131f69/original_file_0.jpg
Attribute "metadata" had a 463 byte value for /storage/bucket/6d97b1f7-f3cb-49db-89a3-6039d01f24ba/b17d0638-275e-4864-b868-a13131131f69/original_file_0.jpg:
{"ContentType":"application/json","ContentEncoding":"","Crc32c":"p7Um+Q==","Md5Hash":"r2DdsnVOdPUZweG4crCjog==","Etag":"\"r2DdsnVOdPUZweG4crCjog==\"","ACL":[{"Entity":"projectOwner-test-project","EntityID":"","Role":"OWNER","Domain":"","Email":"","ProjectTeam":null}],"Metadata":null,"Created":"2023-03-16T12:26:21.005992Z","Deleted":"0001-01-01T00:00:00Z","Updated":"2023-03-16T12:26:21.005996Z","CustomTime":"0001-01-01T00:00:00Z","Generation":1678969581005998}

Now this happens regardless of whether content_type is added to the original upload. Here is the output of the SDK with no content_type parameter. As you can see, no content_type is present in the initial upload, but the actual uploaded file ends up with a content type of application/json.

Intiating resumable upload command to http://localhost:4080/upload/storage/v1/b/bucket/o?name=b6e3a967-f145-40e1-89ed-e0b524c67431%2Fdbaa0927-be5f-4993-b314-f4a51ae7aa1c%2Foriginal_file_0.jpg
Success - #<Google::Apis::StorageV1::Object:0x00007f8bcf8f7d90
 @acl=
  [#<Google::Apis::StorageV1::ObjectAccessControl:0x00007f8bcf8f6d78
    @bucket="bucket",
    @entity="projectOwner-test-project",
    @object=
     "b6e3a967-f145-40e1-89ed-e0b524c67431/dbaa0927-be5f-4993-b314-f4a51ae7aa1c/original_file_0.jpg",
    @project_team=
     #<Google::Apis::StorageV1::ObjectAccessControl::ProjectTeam:0x00007f8c1efbb7e0>,
    @role="OWNER">],
 @bucket="bucket",
 @generation=0,
 @id=
  "bucket/b6e3a967-f145-40e1-89ed-e0b524c67431/dbaa0927-be5f-4993-b314-f4a51ae7aa1c/original_file_0.jpg",
 @kind="storage#object",
 @name=
  "b6e3a967-f145-40e1-89ed-e0b524c67431/dbaa0927-be5f-4993-b314-f4a51ae7aa1c/original_file_0.jpg",
 @size=0>

Sending upload command to http://[::]:4080/upload/storage/v1/b/bucket/o?uploadType=resumable&name=b6e3a967-f145-40e1-89ed-e0b524c67431%2Fdbaa0927-be5f-4993-b314-f4a51ae7aa1c%2Foriginal_file_0.jpg&upload_id=9b19140d2d0e510b73c61a0fe8947d30
Success - #<Google::Apis::StorageV1::Object:0x00007f8c3fadab88
 @acl=
  [#<Google::Apis::StorageV1::ObjectAccessControl:0x00007f8c3fad9b70
    @bucket="bucket",
    @entity="projectOwner-test-project",
    @object=
     "b6e3a967-f145-40e1-89ed-e0b524c67431/dbaa0927-be5f-4993-b314-f4a51ae7aa1c/original_file_0.jpg",
    @project_team=
     #<Google::Apis::StorageV1::ObjectAccessControl::ProjectTeam:0x00007f8c3fbc3950>,
    @role="OWNER">],
 @bucket="bucket",
 @content_type="application/json",
 @crc32c="9YirKw==",
 @etag="\"t9JShAgcze7Jkm4cKJtasQ==\"",
 @generation=1678974975332775,
 @id=
  "bucket/b6e3a967-f145-40e1-89ed-e0b524c67431/dbaa0927-be5f-4993-b314-f4a51ae7aa1c/original_file_0.jpg",
 @kind="storage#object",
 @md5_hash="t9JShAgcze7Jkm4cKJtasQ==",
 @name=
  "b6e3a967-f145-40e1-89ed-e0b524c67431/dbaa0927-be5f-4993-b314-f4a51ae7aa1c/original_file_0.jpg",
 @size=410714,
 @time_created=Thu, 16 Mar 2023 13:56:15 +0000,
 @updated=Thu, 16 Mar 2023 13:56:15 +0000>

Now, as I mentioned, this even happens when you pass the correct content_type to the upload command.

Intiating resumable upload command to http://localhost:4080/upload/storage/v1/b/bucket/o?name=2b708cc5-f130-4be0-8baf-ccf902095b8e%2Fc0033f4b-38a8-423c-b47d-109cbd07003c%2Foriginal_file_0.jpg
Success - #<Google::Apis::StorageV1::Object:0x00007f8544ee73b8
 @acl=
  [#<Google::Apis::StorageV1::ObjectAccessControl:0x00007f8544ee4320
    @bucket="bucket",
    @entity="projectOwner-test-project",
    @object=
     "2b708cc5-f130-4be0-8baf-ccf902095b8e/c0033f4b-38a8-423c-b47d-109cbd07003c/original_file_0.jpg",
    @project_team=
     #<Google::Apis::StorageV1::ObjectAccessControl::ProjectTeam:0x00007f8535c22650>,
    @role="OWNER">],
 @bucket="bucket",
 @content_type="image/jpeg",
 @generation=0,
 @id=
  "bucket/2b708cc5-f130-4be0-8baf-ccf902095b8e/c0033f4b-38a8-423c-b47d-109cbd07003c/original_file_0.jpg",
 @kind="storage#object",
 @name=
  "2b708cc5-f130-4be0-8baf-ccf902095b8e/c0033f4b-38a8-423c-b47d-109cbd07003c/original_file_0.jpg",
 @size=0>

Sending upload command to http://[::]:4080/upload/storage/v1/b/bucket/o?uploadType=resumable&name=2b708cc5-f130-4be0-8baf-ccf902095b8e%2Fc0033f4b-38a8-423c-b47d-109cbd07003c%2Foriginal_file_0.jpg&upload_id=0f02645f3d2c2029c9a1c05425ae1c09
Success - #<Google::Apis::StorageV1::Object:0x00007f8574cc6658
 @acl=
  [#<Google::Apis::StorageV1::ObjectAccessControl:0x00007f8524d1faf0
    @bucket="bucket",
    @entity="projectOwner-test-project",
    @object=
     "2b708cc5-f130-4be0-8baf-ccf902095b8e/c0033f4b-38a8-423c-b47d-109cbd07003c/original_file_0.jpg",
    @project_team=
     #<Google::Apis::StorageV1::ObjectAccessControl::ProjectTeam:0x00007f8524d1c4b8>,
    @role="OWNER">],
 @bucket="bucket",
 @content_type="application/json",
 @crc32c="rSseRg==",
 @etag="\"4g3RHUNbHQr3TfVps1zKJw==\"",
 @generation=1678973375548755,
 @id=
  "bucket/2b708cc5-f130-4be0-8baf-ccf902095b8e/c0033f4b-38a8-423c-b47d-109cbd07003c/original_file_0.jpg",
 @kind="storage#object",
 @md5_hash="4g3RHUNbHQr3TfVps1zKJw==",
 @name=
  "2b708cc5-f130-4be0-8baf-ccf902095b8e/c0033f4b-38a8-423c-b47d-109cbd07003c/original_file_0.jpg",
 @size=237839,
 @time_created=Thu, 16 Mar 2023 13:29:35 +0000,
 @updated=Thu, 16 Mar 2023 13:29:35 +0000>

If we restart the GCS server and it reloads the files from disk, the content type is set correctly (again, can be verified by running attr on the file as well). I assume this is because of https://github.com/fsouza/fake-gcs-server/issues/531.

It would however be great if the content type would also be correct for newly uploaded files, when using resumable uploads!

It looks like this was addressed in https://github.com/fsouza/fake-gcs-server/issues/532 and supposedly fixed in https://github.com/fsouza/fake-gcs-server/pull/924 but it does not seem to be working.

I also tried passing in the ContentType via metadata, according to the fix mentioned here: #924.

Intiating resumable upload command to http://localhost:4080/upload/storage/v1/b/bucket/o?name=4dd472d2-9668-44a8-acb0-bb7126a2b0cf%2F1db2bb8b-73aa-410a-9d74-007bf111325d%2Foriginal_file_0.jpg
Success - #<Google::Apis::StorageV1::Object:0x00007fd4afa404a8
 @acl=
  [#<Google::Apis::StorageV1::ObjectAccessControl:0x00007fd4b800ef08
    @bucket="bucket",
    @entity="projectOwner-test-project",
    @object=
     "4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg",
    @project_team=
     #<Google::Apis::StorageV1::ObjectAccessControl::ProjectTeam:0x00007fd4afc2b678>,
    @role="OWNER">],
 @bucket="bucket",
 @content_type="image/jpeg",
 @generation=0,
 @id=
  "bucket/4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg",
 @kind="storage#object",
 @metadata={"ContentType"=>"image/jpeg"},
 @name=
  "4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg",
 @size=0>

Sending upload command to http://[::]:4080/upload/storage/v1/b/bucket/o?uploadType=resumable&name=4dd472d2-9668-44a8-acb0-bb7126a2b0cf%2F1db2bb8b-73aa-410a-9d74-007bf111325d%2Foriginal_file_0.jpg&upload_id=3abe67bc7326fca3d517d792e42f07ad
Success - #<Google::Apis::StorageV1::Object:0x00007fd4afa208b0
 @acl=
  [#<Google::Apis::StorageV1::ObjectAccessControl:0x00007fd49862f178
    @bucket="bucket",
    @entity="projectOwner-test-project",
    @object=
     "4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg",
    @project_team=
     #<Google::Apis::StorageV1::ObjectAccessControl::ProjectTeam:0x00007fd4afabb1a8>,
    @role="OWNER">],
 @bucket="bucket",
 @content_type="application/json",
 @crc32c="jnk9vw==",
 @etag="\"GwotjEH5otIJKB7HArHxLw==\"",
 @generation=1678978075907447,
 @id=
  "bucket/4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg",
 @kind="storage#object",
 @md5_hash="GwotjEH5otIJKB7HArHxLw==",
 @metadata={"ContentType"=>"image/jpeg"},
 @name=
  "4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg",
 @size=46951,
 @time_created=Thu, 16 Mar 2023 14:47:55 +0000,
 @updated=Thu, 16 Mar 2023 14:47:55 +0000>

As you can see, the metadata is even saved correctly, but the actual ContentType is still set to application/json.

# attr -g metadata /storage/danni-dev/4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg
Attribute "metadata" had a 486 byte value for /storage/danni-dev/4dd472d2-9668-44a8-acb0-bb7126a2b0cf/1db2bb8b-73aa-410a-9d74-007bf111325d/original_file_0.jpg:
{"ContentType":"application/json","ContentEncoding":"","Crc32c":"jnk9vw==","Md5Hash":"GwotjEH5otIJKB7HArHxLw==","Etag":"\"GwotjEH5otIJKB7HArHxLw==\"","ACL":[{"Entity":"projectOwner-test-project","EntityID":"","Role":"OWNER","Domain":"","Email":"","ProjectTeam":null}],"Metadata":{"ContentType":"image/jpeg"},"Created":"2023-03-16T14:47:55.90744Z","Deleted":"0001-01-01T00:00:00Z","Updated":"2023-03-16T14:47:55.907445Z","CustomTime":"0001-01-01T00:00:00Z","Generation":1678978075907447}
otherguy commented 1 year ago

Hey @fsouza! I saw you marked it as closed but then reopened it. Did #1141 not fix the issue?

fsouza commented 1 year ago

Hey @fsouza! I saw you marked it as closed but then reopened it. Did #1141 not fix the issue?

I'm not sure if the issue is fixed by that PR, looking at it I don't think I see how. It ended-up getting closed automatically due to a comment in the PR body. I still plan to have a deeper look at this issue some time in the next couple of weeks.

otherguy commented 1 year ago

Awesome, thank you! Looking forward to the fix 😄

larsivi commented 2 months ago

Thank you for this fantastic test tool!

I just ran into this issue, us using the node client libraries.

Creating a file with the "blob.createWriteStream" API, tests with explicit contentType failed as it always came back as "application/octet-stream", something that appears to be the default in upload.go if Content-Type header is missing.

I have verified that resumable is also the cause here, and if I put { resumable: false } into the createWriteStream-options, it works as it should. Setting this flag is probably ok as a workaround, especially as Google says it should be set to false for files smaller than 10MB, but presumably this means that it should be on for larger files.

I'm testing using 1.49.3

mike-marcacci commented 1 month ago

Thanks for the workaround @larsivi. I can confirm that this fixed the issue in our case, also on 1.49.3.