SSLMate / certspotter

Certificate Transparency Log Monitor
https://sslmate.com/certspotter
Mozilla Public License 2.0
983 stars 84 forks source link

Download entries in parallel #34

Open AGWA opened 5 years ago

AGWA commented 5 years ago

It's possible to obtain significantly higher throughput by downloading entries in parallel. There are several challenges with this approach, however:

  1. We don't know how many entries a log will return, so it's hard to know what the optimal job size is.
  2. We'll need to gracefully handle rate limiting.
  3. It's much easier to update the Collapsed Merkle Tree serially. I see a few ways to handle this:
    1. We could force the entries to be processed in serial by using a buffered channel of buffered channels. However, I think it will be difficult to reason about buffers.
    2. We could split up jobs on subtree boundaries, making it easy to merge the Collapsed Merkle Trees of all the jobs at the end.
    3. We could devise a sparse Collapsed Merkle Tree data structure which allows nodes to be appended in any order.
AGWA commented 9 months ago

A data structure like this might be useful:

type DiscontiguousCollapsedTree []CollapsedTreeSegment
type CollapsedTreeSegment struct {
    Index uint64
    Tree  CollapsedTree
}
func (t *DiscontiguousCollapsedTree) Add(index uint64, tree CollapsedTree) {
    maxSize := uint64(1) << bits.TrailingZeros64(index)
    if tree.size > maxSize {
        // error: tree too large to be placed at this index
    }
    // find segment prior to index in *t. append tree to this segment if tree is contiguous (and also see if we can merge in the following segment), otherwise insert a new segment in *t at this location
}