/bytes redundancy headers ignored

ldeffenb commented 1 month ago

Context

2.1.0-rc2 in sepolia testnet

Summary

When doing a single-chunk /bytes retrieval, it seems that the redundancy headers are being ignored and multiple chunks at different references are being fetched and cancelled anyway.

Expected behavior

If I specify Swarm-Redundancy-Strategy: 0 and Swarm-Redundancy-Fallback-Mode: false, I would expect only the referenced chunk to be retrieved.

Actual behavior

For every request, multiple chunks are being fetched and cancelled per the logs below. I have confirmed that the headers are visible in the api based on the first log in my hacked version. https://github.com/ldeffenb/bee/blob/2ebe25af0c8238a37f9526584f31b0bd00b6296a/pkg/api/bzz.go#L538

This is without the redundancy headers:
curl http://192.168.10.38:1633/bytes/6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd --output temp.tmp
"time"="2024-05-24 14:36:23.851095" "level"="debug" "logger"="node/api" "msg"="downloadHandler" "reference"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "Strategy"=null "Fallback"=null
"time"="2024-05-24 14:36:23.856788" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="ab2cfc3671935481246dd99f7e3829c0ef678d69171e6816ef50b94b5a16d011" "error"="context canceled"
"time"="2024-05-24 14:36:23.856969" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="7bb8257947db81bd1722918a7bbd3c1ebd8483cc561a0e2c7521c627a11c7ecc" "error"="context canceled"

The same happens with the headers supplied (note the downloadHandler sees the values):
curl -H "Swarm-Redundancy-Strategy: 0" -H "Swarm-Redundancy-Fallback-Mode: false" http://192.168.10.38:1633/bytes/6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd --output temp.tmp
"time"="2024-05-24 14:39:39.053943" "level"="debug" "logger"="node/api" "msg"="downloadHandler" "reference"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "Strategy"=0 "Fallback"=false
"time"="2024-05-24 14:39:39.054327" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="ab2cfc3671935481246dd99f7e3829c0ef678d69171e6816ef50b94b5a16d011" "error"="context canceled"
"time"="2024-05-24 14:39:39.054532" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="7bb8257947db81bd1722918a7bbd3c1ebd8483cc561a0e2c7521c627a11c7ecc" "error"="context canceled"

Steps to reproduce

Find a single chunk reference (like 6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd in sepolia testnet) and do a /bytes retrieval on it. Watch your --verbosity 5 (Debug) logs for other chunk_address references failing with context canceled.

Possible solution

Not sure yet. I haven't figured out how to enable the redundancy logger and haven't had time to provide a known-enabled logger to see what's going on in there. https://github.com/ethersphere/bee/blob/8645aca8600408f9d9066460b756bc1c7e8b5c81/pkg/file/redundancy/getter/getter.go#L68

ldeffenb commented 1 month ago

And this is what it does to the retrieval metrics. Yes, I'm asking for a lot of chunks (walking the OSM manifest reference), but there shouldn't be any failures as all of the chunks are pinned locally. (Which may be a separate problem depending on where the metric is incremented. This sure looks like it's going directly to the swarm and not checking the local pinned store)

ldeffenb commented 1 month ago

Late breaking news: Apparently the redundancy level isn't getting to the file joiner intact. Added the following log: https://github.com/ldeffenb/bee/blob/b50dc2780587e8a9c76bcfe9fcb0656dd8c37142/pkg/file/joiner/joiner.go#L112 And now get:

"time"="2024-05-24 15:37:46.922256" "level"="debug" "logger"="node/api" "msg"="downloadHandler" "reference"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "Strategy"=0 "Fallback"=false
"time"="2024-05-24 15:37:46.922271" "level"="debug" "logger"="node/api" "msg"="joiner.New" "address"="6f6aae5e8bda728b1d50594204376c53d73453b8d195ab7b602a4df8ccaa2ecd" "rLevel"=4
"time"="2024-05-24 15:37:46.922370" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[123,184,37,121,71,219,129,189,23,34,145,138,123,189,60,30,189,132,131,204,86,26,14,44,117,33,198,39,161,28,126,204]
"time"="2024-05-24 15:37:46.922404" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[171,44,252,54,113,147,84,129,36,109,217,159,126,56,41,192,239,103,141,105,23,30,104,22,239,80,185,75,90,22,208,17]
"time"="2024-05-24 15:37:46.922442" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="7bb8257947db81bd1722918a7bbd3c1ebd8483cc561a0e2c7521c627a11c7ecc" "error"="context canceled"
"time"="2024-05-24 15:37:46.922456" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[123,184,37,121,71,219,129,189,23,34,145,138,123,189,60,30,189,132,131,204,86,26,14,44,117,33,198,39,161,28,126,204] "err"="context canceled"
"time"="2024-05-24 15:37:46.922472" "level"="debug" "logger"="node/retrieval" "msg"="retrieval failed" "chunk_address"="ab2cfc3671935481246dd99f7e3829c0ef678d69171e6816ef50b94b5a16d011" "error"="context canceled"
"time"="2024-05-24 15:37:46.922485" "level"="debug" "logger"="node/api" "msg"="replicas.getter.Get: so<-next" "addr"=[171,44,252,54,113,147,84,129,36,109,217,159,126,56,41,192,239,103,141,105,23,30,104,22,239,80,185,75,90,22,208,17] "err"="context canceled"

Note that the joiner sees an rLevel of 4! It should be 0. The other logs are coming from: https://github.com/ldeffenb/bee/blob/b50dc2780587e8a9c76bcfe9fcb0656dd8c37142/pkg/replicas/getter.go#L125 and https://github.com/ldeffenb/bee/blob/b50dc2780587e8a9c76bcfe9fcb0656dd8c37142/pkg/replicas/getter.go#L131

ethersphere / bee