Long time delay associated with head requests to Amazon S3

bartosz-michalak commented 4 years ago

Hello. We use carrierwave to handle attachments stored by Amazon S3. Files in bucket are private. Our API sometimes has to return data about many attachments (100 for instance). In this endpoints we're observing a big delay in response (the more attachments, the longer response time). We've used New Relic to investigate what affects this time, and turns out that HEAD requests are performing one by one (not simultaneously):

Moreover, this causes high CPU utilization (sometimes over 50%) - the more S3 requests, the more CPU utilization. Has anyone encountered a similar problem?

alexshk commented 4 years ago

Hi. Yes, I have similar issue with carrierwave and fog-aws.

alexshk commented 4 years ago

Solved issue with replacing fog-was with carrierwave-aws gem.

felipefava commented 3 years ago

Had the same issue. I used carrierwave-aws as @alexshk mentioned, and is faster now. Thanks

marlonjf commented 2 years ago

@alexshk @felipefava do you still have the HEAD requests withcarrierwave-aws?

For me it's too slow with fog-aws or carrierwave-aws, both have the requests

mshibuya commented 1 year ago

Anyone still having the issue? I looked at the code but both fog-aws and carrierwave-aws are implemented to avoid head requests. So my guess is something in the user code is triggering the head request... If you can make the head request intentionally fail and get the stack trace, it will greatly help tracking down what triggered the request.

rajyan commented 1 year ago

@mshibuya

Hi, we had a similar problem and looked into this problem deeper with stack profiling. What we found was, conditional processing is always evaluated even we are not using the specific conditional version. For condition that needs the file content, it caused unexpected requests like this issue.

This is the minimum reproducible example using the examples from https://github.com/carrierwaveuploader/carrierwave#getting-started https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Do-conditional-processing

class AvatarUploader < CarrierWave::Uploader::Base
  version :jpg_version, if: :webp? do
    process convert: :jpg
  end

  protected

  def webp?(new_file)
    new_file.content_type == 'image/webp'
  end
end

class User < ApplicationRecord
  mount_uploader :avatar, AvatarUploader
end

# This evaluates the condition of :jpg_version even :jpg_version is not used
# which requests to S3 to get the content_type if storage is remote
User.first.avatar.url

I think that the condition should not be evaluated until the actual version is called on retrieving, but should always be called on storing, ended up with this fix https://github.com/carrierwaveuploader/carrierwave/pull/2669

mshibuya commented 1 year ago

Thank you for the report. It's very helpful to understand the actual use-case that causes this issue.

But before proceeding I need to confirm one thing. Why do you need to check if the file is webp or not? Isn't it simpler to convert every image file into jpg, regardless of the source format? You know, checking content type is costly as it will invoke the head request. I'm wondering if doing so is worth enough.

rajyan commented 1 year ago

Thank you for looking into this issue. For our very specific use case, we needed to check the content type because

The API we are referring mostly returns jpg images
It sometimes returns webp content with .jpg extension...

We were using conditional processing, because we don't need to convert the image in most cases, and we thought that it would be much more cost efficient (don't need to process nor uploading duplicated jpg files).

The example in above might be a specific usage, but I think that lot of people are doing similar conditional processings, for example, this wiki example needs remote request too https://github.com/carrierwaveuploader/carrierwave/wiki/How-to:-Do-conditional-processing.

Also, we, and I guess most users that are reporting similar issues too, were not aware that the all versions' conditions are checked every time we called the mounted column (https://github.com/carrierwaveuploader/carrierwave/issues/2132/). Much more, I believe it is difficult to assume that the HEAD request is called even when we are not using the actual version.

rajyan commented 1 year ago

@mshibuya Do you have any thoughts about the above comment? I still think there should be a way to avoid HEAD requests, especially when only reading and not accessing the version.

carrierwaveuploader / carrierwave

Long time delay associated with head requests to Amazon S3 #2461