Closed ldeffenb closed 2 years ago
I will look into the segfault, but please keep in mind that targets
specifies the preceding address bytes that the node will trying to mine. If you specify 1 byte (two hex characters) - it means that it will mine a chunk address with proximity order 16 to the specified target prefix; if you specify 2 bytes you're already at 32, and you are trying to mine an address with po 48, i.e. you're going to have to go through 281474976710656
chunks and hash them in order to get an address at po 48 (if you're lucky!)
Sorry I actually made a mistake. Each byte is 8 PO gains, so 3 bytes is PO 24. That's 16777216
hashing operations.
@acud As @ldeffenb mentioned, the recovery.NewCallback
function that the /chunks api uses sends a PSS message with the RECOVERY
topic to retrieve the chunk from the network. So it needs to stamp the payload before doing so, but a nil
stamper is currently passed in. In order to stamp, we need a valid batch ID.
Is the correct implementation reading the batch ID from the request header?
Any idea when this one will get attention? I agree with @istae that the request specifying the ?targets= should be required to include a batch for use in sending any required PSS messages for the actual recoveries.
@ldeffenb this is not so high up on our priority list. there are some usability problems with the current implementation of global pinning that are a bit difficult to solve right now, and after an in-depth look they might even require protocol changes on some degree so that good UX can ensue.
Disappointed, but understood. I'm still working to characterize the storage failures of the swarm (some small percentage of freshly pushed chunks being non-retrievable within a day or two) and the global pinning would allow me to develop a way to have the missing chunks automatically supplied when discovered.
I'll keep a watch on this and wait to see what direction it takes when it bubbles to the top of the priority list.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
If this is not solved, please @acud open again.
so... this is a multi-tiered problem... relates to the global pinning bad UX. the problem here is that you need to provide a postage stamp on a GET call which I personally find to be *shrug*
at best. This is not on our prio right now, and global pinning needs a serious overhaul with a potential dedicated protocol if it is to be used by anyone
so, until an eventual protocol upgrade, global pinning is now a "not-maintained" feature? For understand if consider it on apps design
But it seems that maybe it should be disabled, or at least patched so that any use of it doesn't actually cause the node to crash. Better to return an error or silently do nothing, than to crash with a hard-coded null stamper reference.
so, until an eventual protocol upgrade, global pinning is now a "not-maintained" feature? For understand if consider it on apps design
It has significant UX problems and we don't have bandwidth to further develop it at this stage. We're also looking into ways to offload this to L2 services that could do the message interaction for you. Obviously, you can write an application that somehow tries to handle this correctly and attaches all necessary information to the request so that you can indeed use global pinning. However if the case is for an application that does not employ a custom gateway (that attaches the necessary postage stamp to the request), my assumption at this stage is that the requirement that users should have a postage stamp ready and attached to every HTTP GET
seems not so user friendly and counter intuitive both for users and developers alike. So TL;DR: you can keep using it in its current form; it will probably get revamped at some point (in which case you'll have ample time to react); we're not gonna yank it out of the codebase for now; and indeed the panic should be resolved.
Thank you for explanation. Luckily it isn't a blocking feature for us, anyway now I know that its better if we try to find alternative solutions, if will need.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Summary
Running a v1.0.0 bee node with
global-pinning-enable: true
. Requested a chunk with targets that the node did not have with:curl http://192.168.10.185:8080/chunks/5e7132318b6f5482fb02e2ed48c293800f0774470387412fbd3db1a3366736bd?targets=f99db6
this returned the expected:
{"message":"chunk recovery initiated. retry after sometime.","code":202}
At this point, the bee node consumes copious amounts of CPU and then abruptly terminates with:
Steps to reproduce
Run a bee node with
global-pinning-enable: true
and request a/chunks/xxxxxx?targets=yyyyyy
where yyyyyy must be an even count of hex digits. Wait for the response and then wait even longer for the crash to occur. I believe the additional delay is caused by pss.Wrap inside pss.Send attempting to target the appropriate neighborhood.Expected behavior
I would expect the recovery request to be issued successfully via pss.
Actual behavior
The application crashed.
I have tracked it down through the source code and it seems that pss.Send now requires a stamper, but the recovery repair callback specifies null. As soon as the stamper is used in pss.Send, the application terminates.
https://github.com/ethersphere/bee/blob/6f3a38292e5e0fa3cd4856bf7dc9f5ba27c6c293/pkg/pss/pss.go#L98
https://github.com/ethersphere/bee/blob/6f3a38292e5e0fa3cd4856bf7dc9f5ba27c6c293/pkg/recovery/repair.go#L38
https://github.com/ethersphere/bee/blob/6f3a38292e5e0fa3cd4856bf7dc9f5ba27c6c293/pkg/netstore/netstore.go#L57