Closed 10io closed 3 years ago
Is a missing HEAD request the actual source of the duplicate downloads? https://github.com/fog/fog-google/issues/512
To answer my own question, @10io looks correct. #512 does not appear to be the issue, and I also see a double request here. This implementation of body
seems confusing.
I don't fully grasp that line of code: I don't get why we need collection.get(identity). I assume that it has something to do with a Fog::Storage::GoogleJSON::File coming from a metadata request vs a Fog::Storage::GoogleJSON::File coming from a "full" get (with the alt=media query parameter).
I think this happens because you can get a Fog::Storage::GoogleJSON::File
without the body because it can come from the list bucket API (https://cloud.google.com/storage/docs/listing-objects#rest-list-objects). This appears to happen if you call directories.files.to_a
. The collection.get(identity)
is a lazy load for the full attributes of the object.
Take a look at https://github.com/fog/fog-google/pull/545.
Thanks @stanhu for the PR! 🙏
I had the exact same question as you: why do we need the last_modified
field?
To me, either the body
is loaded or not depending if the current file is a lightweight one (metadata only) or not. If not, simply load the body. I felt that I was missing a piece with that last_modified
field.
From what I see, fog-aws
is using a similar logic: https://github.com/fog/fog-aws/blob/d7ec997c7b02b2f433828ebf33679671c2e3005c/lib/fog/aws/models/storage/file.rb#L107-L117. The last_modified
field is used. Was it copied over here? 🤔
@10io It's probably copied from there. Could you file an issue/PR for fog-aws?
@10io It's probably copied from there. Could you file an issue/PR for fog-aws?
@stanhu I apologize for the late response.
I definitely will, I'm planning to do it this week.
Thanks! 🙏
Hello! 👋
I've been using this adapter to access files on GCP Storage and hit an issue. If I head the file and then I access
#body
, the file seems to be downloaded twice. This is ok for small files but can quickly become a problem for bigger files.Here is how to reproduce this:
Here is the log (truncated) from
google-api-client
whenf.body
is executed:I see two temporary files being created. I put some additional logs around those files and I can confirm that both have the same size and the same content. To me, the file really seems to be downloaded twice. 😿 In addition, those
Sending HTTP get
lines seems to indicate that the API is hit twice.From my code inspection, the problem seems to come from https://github.com/fog/fog-google/blob/98a00e1bbace13a65b42f1b83c1911ac01ce71c0/lib/fog/storage/google_json/models/file.rb#L48 If I get this correctly,
collection.get(identity)
andfile.body
both download the file. Is this expected?I don't fully grasp that line of code: I don't get why we need
collection.get(identity)
. I assume that it has something to do with aFog::Storage::GoogleJSON::File
coming from a metadata request vs aFog::Storage::GoogleJSON::File
coming from a "full" get (with thealt=media
query parameter).