Closed teor2345 closed 4 years ago
Hashes can be obtained from zcashd using zcash-cli getblock $i
and using jq
or something to parse the JSON. Note that the output will have to be run through zebrad revhex
since zcashd displays hashes in reversed byte order and we don't.
I'll do a small script, so we can rebuild the list consistently.
@hdevalence do we have a preferred scripting language?
No, I don't think so, we've never had to write scripts before.
My personal rule of thumb is bash for things that are near enough to oneliners for me not to worry about the maintainability and rust for all other cases.
A cool thing cargo does is automatically create a target for each file in src/bins
, so you could for example do a cargo new --lib zebra-utils
or w/e you wanna call the crate that holds all the "scripts" and then create a bins dir in that and just create a file for each new binary. This is what I spend most of my last two years with rust doing so if you need recommendations for libraries to use when writing small rust cli scripts lmk.
I think the script will look like
i=0
while $i < $[`zcash-cli getblockcount | jq ...` - 300]; do
zcash-cli getblock $i | jq ... | zcashd revhex >> checkpoints.txt
i=$[i + 500]
done
Whatever we choose, it will become a release-time dependency. I'll make an initial script using generic unix shell. Then we can convert it to Rust later, if we decide that's important.
Also, maybe it would be faster to use getblockhashes
, and discard the hashes we don't want:
https://zcash-rpc.github.io/getblockhashes.html
I'll do some testing.
I've been thinking about this, and calling the RPC multiple times in the script seems slow and error-prone. We're going to have to parse json, call jq
and zebrad revhex
, and generate the checkpoint heights.
I'd rather do all that work in Rust, using serde-json
:
zcash-cli getblockhashes INT_MAX 0 | zebrad checkpoints
We'll need a zebrad checkpoints --testnet
option, to generate the testnet list.
Once that's done, we could even add modes for generating our own checkpoints, or listing our hard-coded checkpoints:
zebrad checkpoints --from-zcashd
zebrad checkpoints --list
zebrad checkpoints --generate
(But I'll just do the zcashd mode for now.)
could we have zebrad checkpoints
try to call zcash-cli
internally rather than having to pipe the output into zebrad?
Maybe?
For the testnet, the command-line would have to be zcash-cli -testnet getblockhashes INT_MAX 0 | zebrad checkpoints
. And there's no guarantee that we'd get the RPC authentication options right, so we'd have to allow the pipe anyway.
For the testnet, the command-line would have to be
zcash-cli -testnet getblockhashes INT_MAX 0 | zebrad checkpoints
we could expose a --testnet
arg that adds the -testnet
arg to the internal zcash-cli
invocation we call
And there's no guarantee that we'd get the RPC authentication options right
I don't know what this means
For the testnet, the command-line would have to be
zcash-cli -testnet getblockhashes INT_MAX 0 | zebrad checkpoints
we could expose a
--testnet
arg that adds the-testnet
arg to the internalzcash-cli
invocation we call
You're right, we'll definitely need a --testnet
option, to supply the right hard-coded genesis block hash.
And there's no guarantee that we'd get the RPC authentication options right
I don't know what this means
For example, the data directory, and RPC IP, port, username and password. (None of these options are required, but we can't guarantee the user is using the defaults.)
For example, the data directory, and RPC IP, port, username and password. (None of these options are required, but we can't guarantee the user is using the defaults.)
can we just re-export all of these knobs as cli flags like we do with -testnet
?
We could passthrough all unrecognised arguments.
There are only 10 options, so it should be easy to list them all. (Hopefully they don't add any extra options.)
I tried deleting the comment before you got to it but the more I thought about it I realized it should be pretty easy to just forward the args like you said and it has the advantage of dealing with the extra options. I'll have to check out gumdrop in more detail but I know exactly how to do this with structopt so worse case scenario we could just manually re-parse the args ourselves via structopt in that specific subcommand.
We'll want to parse and forward -testnet
and -regtest
, so we can select the right genesis block. And just forward everything else.
Hmm, I'm not sure that adding a new command to zebrad
is preferable to having a standalone script, because it means that the checkpoint generation is shipped as part of the main functionality of the node software, rather than being a step used to create an artifact that's part of the build. I don't see us needing to generate checkpoint lists very often -- I think that it's enough to keep a file in git and periodically run a script to regenerate it and check in the changes.
Another thought (branching off of here) is that since checkpoints are generated retrospectively, not prospectively, we can tune the checkpoint heights according to the block sizes.
Since we know the sizes of all of the blocks we're checkpointing, we can estimate how much memory will be used for queued blocks in the honest case (assuming that the in-memory parsed representation has the same size as the serialized representation, which isn't exactly true but is true within a small constant factor). The worst-case analysis of memory usage assumes that we queue a whole bunch of dishonestly generated blocks that are too big. I don't think this is important to defend against, because (a) we already have mechanisms that make it unlikely to chase a false chain and (b) the failure is a transient OOM error, from which we can just restart and talk to other peers. (If we are unable to talk to honest peers, we have bigger problems).
We might want to target a lower number rather than a higher one, say, not accumulating more than 256MB of block data at a time. Since most Zcash blocks are smaller than the maximum block size, I don't think this will be super constraining in practice.
Hmm, I'm not sure that adding a new command to zebrad is preferable to having a standalone script, because it means that the checkpoint generation is shipped as part of the main functionality of the node software, rather than being a step used to create an artifact that's part of the build.
I think you're right, let's use a script for this release. If the script gets too complex, we can rewrite it in another language. (And if that language is Rust, we can do that in a separate repository that depends on zebra-chain and maybe zebra-consensus.)
I don't see us needing to generate checkpoint lists very often -- I think that it's enough to keep a file in git and periodically run a script to regenerate it and check in the changes.
In tor we refreshed a similar list (long-lived node addresses) every 6-12 months.
It was a python script :-)
Another thought (branching off of here) is that since checkpoints are generated retrospectively, not prospectively, we can tune the checkpoint heights according to the block sizes.
Since we know the sizes of all of the blocks we're checkpointing, we can estimate how much memory will be used for queued blocks in the honest case (assuming that the in-memory parsed representation has the same size as the serialized representation, which isn't exactly true but is true within a small constant factor). …
We might want to target a lower number rather than a higher one, say, not accumulating more than 256MB of block data at a time. Since most Zcash blocks are smaller than the maximum block size, I don't think this will be super constraining in practice.
I'll try to implement this design in the checkpoint script.
The worst-case analysis of memory usage assumes that we queue a whole bunch of dishonestly generated blocks that are too big. I don't think this is important to defend against, because (a) we already have mechanisms that make it unlikely to chase a false chain and (b) the failure is a transient OOM error, from which we can just restart and talk to other peers. (If we are unable to talk to honest peers, we have bigger problems).
My goal was to make quick changes that defended against some worst case attacks, and honest-case overload behaviour. But it looks like that's more complicated than I expected, so I won't be able to do it quickly. And I should move on to more important tasks.
In general, I agree with your conclusions, so I'm going to avoid spending time on the worst case, and update the design doc to explain why the worst case is out of scope.
But I think we're talking about slightly different worst-case attacks, so I'd like to check your analysis of the following attack:
A single bad peer answers any block hash request with a large block, that has a height near the end of the checkpoint list. We store those blocks, but can't verify them yet, so we run out of memory.
To defend against this attack, we need to avoid downloading a large number of blocks (or a large amount of data) from a single peer. For example, if our memory limit is 2 GB, we need to avoid downloading more than 1000 blocks from a bad peer.
There are 900,000 blocks in the chain, and only a few hundred peers. So if we choose peers uniformly at random, we're almost certain to download a few thousand blocks from each peer.
As an alternative, we need to reject blocks that are a long way ahead of the current checkpoint. Or reject blocks that are larger than expected.
There are also operating systems that hang or terminate other processes when out of memory (macOS default config, some Linux configs, maybe Windows). I don't think that's a big issue, but it should definitely be documented somewhere.
The average block size is around: 30 GB / 900,000 = 35 kB
So if we limit our checkpoints to 256 MB in the honest-case, most checkpoints will be 5000 - 10,000 blocks apart: 256 MB / 35 kB = 7300
So we might want to pick a maximum interval size. And then use smaller checkpoints as needed. I'll check when I write the script.
We could also find the maximum block size in the historic chain, and refuse larger blocks while checkpointing. We could efficiently reject blocks by recording the block size at parse time. (And if there are a few large blocks in the chain, we could record the largest block in each checkpoint range.) I'll add this to the possible designs.
fwiw @teor2345 I already added a zebra-utils
crate to the workspace and a zebra-checkpoints.rs
binary file to use to write this in rust
But I think we're talking about slightly different worst-case attacks, so I'd like to check your analysis of the following attack:
A single bad peer answers any block hash request with a large block, that has a height near the end of the checkpoint list. We store those blocks, but can't verify them yet, so we run out of memory.
If I understand the attack correctly, this shouldn't be possible, because the network layer ensures that a request for a block with a particular hash resolves either to a block with that hash or to an error.
This actually happens not for a security reason but for a practical one. Because Bitcoin doesn't actually have a request/response concept, the network layer has to statefully examine the messages that come in to determine whether they could be a response to the pending request, and to determine when enough messages have been received to "finish" the response. In the case of a block request, the Handler
maintains a HashSet
of the still-outstanding block hashes and a Vec<Arc<Block>>
of the already-received blocks. Each time a block
message is received, the Handler
computes the block hash and either removes the hash or discards the block. This allows it to determine when it has received all of the blocks required to complete the request. Discarding blocks with the wrong hash is necessary because zcashd
will sometimes respond with the wrong block. I don't actually know why this happens. Maybe zcashd
gets confused sometimes.
But the net of all of this is that a bad peer that answers any block hash request with an unrelated large block will cause the block request to time out, which causes us to fail the connection and stop talking to that peer.
To defend against this attack, we need to avoid downloading a large number of blocks (or a large amount of data) from a single peer. For example, if our memory limit is 2 GB, we need to avoid downloading more than 1000 blocks from a bad peer.
There are 900,000 blocks in the chain, and only a few hundred peers. So if we choose peers uniformly at random, we're almost certain to download a few thousand blocks from each peer.
Although we're certain to download a few thousand blocks from each peer, we're certain to do so only over the course of the entire sync process -- we'd never ask a single peer for more than a small chunk of contiguous blocks (I think this is currently 10, the exact number doesn't really matter so much as the fact that it's quite small). And because we know that any request for specific block hashes either resolves to those blocks or to an error, either each chunk succeeds, in which case we have the blocks we want, or it fails, in which case we fail that connection and retry (a limited number of times) with a different peer.
So I don't think that there's a possibility for a single peer to significantly interrupt the checkpointing procedure.
Thanks, that's a much stronger security guarantee than I expected. I'll make sure I document that in #636.
(And close the PRs that are obsoleted by the new design.)
For the checkpoint verifier, we need to create checkpoint height and hash lists for:
Since the sapling checkpoint is mandatory, we want to compile the lists into Zebra.
Eventually, we also want to:
Open questions:
The block spacing range for 256 MB checkpoints is:
Heartwood activated at:
Our initial draft list of checkpoints was created before Heartwood on mainnet, but after Heartwood on testnet. The testnet checkpoints include a post-Heartwood block.