Closed jacobsa closed 9 years ago
Don't forget:
.gz
files.This is made more difficult by Google-internal bug 24347854 (which I just discovered): if you upload invalid gzip content and then go to read it back, you always get HTTP 503 no matter what you set for Accept-Encoding
.
Filed Google-internal bug 24347482 for the underspecified documentation on what GCS is expected to do in a bunch of cases.
I've come to the conclusion that contentEncoding
shouldn't/can't be supported by gcsfuse in any specific way. Rather, we should treat this like versioned buckets and explicitly say the behavior is undefined when you use such objects with gcsfuse, and advise against doing so.
Brain dump about how the contentEncoding
feature is problematic:
contentEncoding
to any string you want—but the documentation only specifies what will happen for gzip. In Google bug 24347482 it was clarified to me that other encodings are simply ignored. But this is hardly confidence-inspiring—who's to say that GCS won't suddenly start supporting bzip2, changing the behavior of a whole class of requests? Even if that never happens, you may be behind an intermediate proxy who groks bzip2.Because you can't see the pre-gzip length of the data, gcsfuse would have no choice but to surface the post-gzip data as the content of files, so that the file metadata matched the contents. Okay, that's fine, we would just read that data and return it to the user. Except the documentation doesn't make it clear that there is any reliable way to opt out of GCS's magic behavior around encodings.
If I set Accept-Encoding: gzip
on my read requests, it appears to return the original content. But given the usual use of this header, I worry that it's possible that some internal system will decode the content then some other will later re-encode it, yielding different bytes. Worse, I worry that this will cause objects without any contentEncoding
property set to be gzip-encoded before being sent to me, in the mistaken thought that I'm setting this header to save bandwidth rather than to opt out of the feature. The documentation is less than helpful in making me confident this won't happen.
Accept-Encoding: gzip
, especially for a read of an object that is not already encoded. Again, this feature appears to be intended only for the "user staring at content in a browser" case; otherwise the designers of the GCS API made a mistake by overloading Accept-Encoding
and Content-Encoding
for this feature.Range
headers in requests in several cases (see Google bug 24347482), which means we can't efficiently read only a portion of a very large object.
GCS objects have a
contentEncoding
property, sort of but not really documented here. That page implies that maybe it is always echoed asContent-Encoding
when serving a read for the object, but it's not clear. This page says that it's intended to work with a value ofgzip
, and sort of implies by omission that it's not intended to work with other encodings. This page has slightly more detail about motivations and behavior.Throw into the mix the fact that Go's
http.Transport
automatically setsAccept-Encoding: gzip
on requests if no otherAccept-Encoding
is set (cf.Transport.DisableCompression
), then transparently decompresses if it getsContent-Encoding: gzip
back, and this starts to get confusing.To do:
contentEncoding
set, for valuesgzip
and otherwise.semantics.md
.(Thanks to Jurek Papiorek for raising this issue.)