codex-storage / nim-codex

Decentralized Durability Engine
https://codex.storage
Apache License 2.0
69 stars 25 forks source link

Node crash when Request's expiry overflow #637

Open AuHau opened 1 year ago

AuHau commented 1 year ago

If way big expiry timestamp is provided (I managed that because of mistake), then it crashes the nodes in the network and this is a persistent during restarts as the big expiry timestamp is persisted in blockchain.

Logs:

TRC 2023-11-27 11:08:29.294+01:00 Received a request for storage!            topics="codex node" tid=922905 cid=zdj*zf6vra duration=3c nodes=1 tolerance=0 reward=400 proofProbability=1 collateral=32 expiry=18*6924fd
TRC 2023-11-27 11:08:29.295+01:00 Retrieving manifest for cid                topics="codex node" tid=922905 cid=zdj*zf6vra
TRC 2023-11-27 11:08:29.297+01:00 Decoding manifest for cid                  topics="codex node" tid=922905 cid=zdj*zf6vra
TRC 2023-11-27 11:08:29.297+01:00 Decoded manifest                           topics="codex node" tid=922905 cid=zdj*zf6vra
WRN 2023-11-27 11:08:29.313+01:00 Attempting to resolve block that's not currently a pending block topics="codex pendingblocks" tid=922905 address="cid: zdj7WaHcduWCosa8rYiRdRn9Dg9oe9kspM8uCmd4uq7rc6Cti"
INF 2023-11-27 11:08:29.630+01:00 Request submitted, waiting for slots to be filled topics="marketplace purchases submitted" tid=922905 requestId=2a8d78519c1df8220d9d7bed10da4df355c722e88b6001dff3af000971b36d40
/Users/adam/Projects/status/nim-codex/vendor/nimbus-build-system/vendor/Nim/lib/system/fatal.nim(54) sysFatal
/Users/adam/Projects/status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
/Users/adam/Projects/status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
/Users/adam/Projects/status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
???                      run
/Users/adam/Projects/status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
???                      run
/Users/adam/Projects/status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
???                      scheduler
/Users/adam/Projects/status/nim-codex/vendor/nim-chronos/chronos/asyncfutures2.nim(355) futureContinue
Error: unhandled exception: over- or underflow [OverflowDefect]
tbekas commented 1 year ago

Thanks for the report. Probably it's this arithmetic that is causing this. I think we should: use datatypes of the same capacity (length) both in contract and in the client and use safe arithmetic (one that doesn't crashes the node when there's overflow).

Blockchain contract uses uint256 for ttl?

AuHau commented 1 year ago

Thanks for the report. Probably it's this arithmetic that is causing this.

I don't think that is it. I have my eyes on: https://github.com/codex-storage/nim-codex/blob/8681a40ee79b260923d49f30dced865172c53b60/codex/clock.nim#L44

Do you know if truncate() call could lead to this? Underneath, it calls cast[]() which I am now not sure if it would do something?

I think we should: use datatypes of the same capacity (length) both in contract and in the client and use safe arithmetic (one that doesn't crashes the node when there's overflow). Blockchain contract uses uint256 for ttl?

Yeaaah, it is bit "complicated". I mean expiry is meant to be a timestamp and it is also often compared to "now" which is from the definition in Nim int64 and because of these comparisons, we are doing these conversions. So if we want to "align the datatypes to Uint256" it would make it quite cumbersome.

Yes, in blockchain contract we use uint256 because it is exposed by the solidity as such (the block.timestamp property). It does not really make sense to have it uint256 but unfortunately it is the "native datatype" for blockchain which costs the least to use. Any addition "trimming" actually leads to more gas used :-(

Honestly, I would prefer more "align the datatypes to int64" and only do the conversion when data are coming from/to the blockchain (I don't think we will change the contract type for this though). I have started the work on it, but paused it as it was taking me too much time šŸ˜… It is here, but it is nowhere usable yet: https://github.com/codex-storage/nim-codex/pull/638

AuHau commented 1 year ago

Weird the value for expiry that crashes the node is actually fitting very easily even into int64: 1700813939965 I originally thought that it is "too big for int64" but that is not the case šŸ˜³

It seems that problem is somewhere around https://github.com/codex-storage/nim-codex/blob/8681a40ee79b260923d49f30dced865172c53b60/codex/contracts/clock.nim#L47-L46

tbekas commented 1 year ago

Thanks for more context on this! If it makes most sense to use uint256 in blockchain, I think we should stay with it and try to change the client - so either change client to the also use uint256 or continue with uint64/int64 but handle values out of range gracefully.

Weird the value for expiry that crashes the node is actually fitting very easily even into int64: 1700813939965 I originally thought that it is "too big for int64" but that is not the case šŸ˜³

That's a bit suspicious šŸ¤” I guess this issue requires a bit more investigation.

dryajov commented 1 year ago

If the times module returns an int64 is should be safe to convert to a uint, but not the other way around? A cast isn't doing any sort of conversion, it's simply doing a blunt bit copy and in general it isn't safe, we should probably simply not use truncate and rely on explicit conversions, if there are overflows, they should be caught way before we get to calling the contracts... I would start by removing that and see where things are getting messed up.

dryajov commented 1 year ago

Err, bit copy is wrong, it's simply forcing the bits to be interpreted differently, without doing checks or any actual conversions.