application-research / estuary

A custom IPFS/Filecoin node that makes it easy to pin IPFS content and make Filecoin deals.
https://docs.estuary.tech
Other
239 stars 67 forks source link

Estuary dropped block from blockstore #465

Open 10d9e opened 1 year ago

10d9e commented 1 year ago

Describe the bug Pulling gzip archives from the main or shuttle through the gateways are currently not working. This poses a problem for third party applications that are expecting to pull raw binary data from the speedier estuary gateways, including the native docker client pulling OCI container layers from IPCR.

To Reproduce Steps to reproduce the behavior:

Working Control Test

Pulling content from the dweb.link works fine

curl -Lv 'https://bafkreiejjboj3lqlj5cqgawuggh3uiwu2rsahmipdxg7zrbquhaylwe2ra.ipfs.dweb.link' \
  -H 'Authorization: XXX' --output test.gz

> Host: bafkreiejjboj3lqlj5cqgawuggh3uiwu2rsahmipdxg7zrbquhaylwe2ra.ipfs.dweb.link
> user-agent: curl/7.79.1
> accept: */*
> authorization: XXX
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 200 
< server: openresty
< date: Sat, 15 Oct 2022 18:15:59 GMT
< content-type: application/gzip
< content-length: 827161
< access-control-allow-methods: GET
< cache-control: public, max-age=29030400, immutable
< etag: "bafkreiejjboj3lqlj5cqgawuggh3uiwu2rsahmipdxg7zrbquhaylwe2ra"
< x-ipfs-gateway-host: ipfs-bank18-ny5
< x-ipfs-path: /ipfs/bafkreiejjboj3lqlj5cqgawuggh3uiwu2rsahmipdxg7zrbquhaylwe2ra/
< x-ipfs-roots: bafkreiejjboj3lqlj5cqgawuggh3uiwu2rsahmipdxg7zrbquhaylwe2ra
< x-ipfs-pop: ipfs-bank18-ny5
< timing-allow-origin: *
< access-control-allow-origin: *
< access-control-allow-methods: GET, POST, OPTIONS
< access-control-allow-headers: X-Requested-With, Range, Content-Range, X-Chunked-Output, X-Stream-Output
< access-control-expose-headers: Content-Range, X-Chunked-Output, X-Stream-Output
< x-ipfs-lb-pop: gateway-bank2-ny5
< x-proxy-cache: MISS
< strict-transport-security: max-age=31536000; includeSubDomains; preload
< accept-ranges: bytes
< 
{ [3025 bytes data]

this will extract the archive successfully gunzip test.gz

Failing

Using the Estuary gateway fails with HTTP/1.1 500 Internal Server Error

curl -Lv 'https://api.estuary.tech/gw/ipfs/bafkreiejjboj3lqlj5cqgawuggh3uiwu2rsahmipdxg7zrbquhaylwe2ra' \
  -H 'Authorization: XXX' --output test-fail.gz

> Host: shuttle-6.estuary.tech
> User-Agent: curl/7.79.1
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 Internal Server Error
< Server: nginx/1.18.0 (Ubuntu)
< Date: Sat, 15 Oct 2022 18:11:29 GMT
< Content-Type: text/plain; charset=utf-8
< Content-Length: 88
< Connection: keep-alive
< Vary: Origin
< X-Appversion: v0.1.9
< X-Content-Type-Options: nosniff

Extracting archive fails

gunzip test-fail.gz 
gunzip: test-fail.gz: not in gzip format

Expected behavior See working control test above

Actual behavior Pulls down corrupted archive

Additional context This affects any user using the Estuary gateways

en0ma commented 1 year ago

@jlogelin I have looked into this, this issue is a case of inline cid. The pin actually exists in shuttle-6, but our blockstore lookup currently does not properly inline cid lookup, it checks for the cid in the blockstore, which will not be there - it is inline.

I will like to work on this, I have pushed it back for too long.

also related to this https://github.com/application-research/estuary/issues/330

10d9e commented 1 year ago

@en0ma This issue is likely a dropped shuttle block, which is still worth checking. I have created another issue around the x-gzip / gzip gateway content delta here, per our conversation today.