apache / opendal

Apache OpenDAL: access data freely.
https://opendal.apache.org
Apache License 2.0
3.25k stars 453 forks source link

bug: Can't read GCS files due to Decompressive Transcoding #5070

Open amos-osmos opened 3 weeks ago

amos-osmos commented 3 weeks ago

Describe the bug

We have some files which are gzip encoded at rest in GCS. Thanks to Decompressive transcoding, when getting metadata about a file the compressed byte count will be returned, but when the file itself is retrieved it will be uncompressed and so the total byte count will be higher.

This interacts poorly with this PR: https://github.com/apache/opendal/pull/4690 where the two values mentioned above are compared and we keep hitting reader got too much data.

Steps to Reproduce

I'm working on providing an actual repro but running into difficulties with permissions.

  1. Have a file gzip encoded on GCS, with object metadata Content-Encoding set to gzip.
  2. Now, build an operator of type opendal::types::operator::operator::Operator and call operator.read()
  3. What currently happens is an opendal::Error as the Result that prints to:
    
    Unexpected (permanent) at  => reader got too much data

Context:

expect:

actual:



### Expected Behavior

An `Ok` `Result` with the contents of the file.

### Additional Context

_No response_

### Are you willing to submit a PR to fix this bug?

- [ ] Yes, I would like to submit a PR.
Xuanwo commented 3 weeks ago

We will bypass the content length check if the response includes content-encoding or it doesn't have content-length header:

https://github.com/apache/opendal/blob/309d3eb5af42a0dc670e0a0aed084996574c58ea/core/src/raw/http_util/client.rs#L126-L131

I will try to reproduce this.