cloudflare / gortr

The RPKI-to-Router server used at Cloudflare
https://rpki.cloudflare.com
BSD 3-Clause "New" or "Revised" License
309 stars 39 forks source link

Fix pathological time it takes to apply large diff #32

Closed ytti closed 5 years ago

ytti commented 5 years ago

Consider scenario where I have

1) roas-big.json (1M entries) 2) roas-small.json (10 entries) 3) symlink roas.json pointing to roas-big.json 4) go run cmd/gortr/gortr.go --cache roas.json -verify=false -checktime=false 5) wait for it to load the ROAS 6) ln -sf roas-small.json roas.json; kill -1 $(pidof gortr) 7) it will not finish in an hour, unsure if ever

The problem is ApplyDiff in lib/server.go, your prevRoas and your diff are both 1M entries, so you're doing 1M*1M loop, because golang doesn't have idiomatic built-in tools for intersection or even testing if slice includes an member people are forced to implement really poor performance options.

The only reason, it seems, that ApplyDiff even does what it does, because AddROAs removes roalock before it calls AddROAsDiff, which then immediately does readd the roalock, but there is delta time, which may change our diff. As AddROAsDiff is not called from anywhere lse but AddROAs we can just keep the lock and rely on the work we already did in ComputeDiff

This change works for me against cXR, eXR, MX and PTX and is sufficient to continue my own testing work of the platforms, instead of testing the RTRd. The change does not reflect how I'd prefer to do this, but preferred solution would require significant rafactoring which I'm skeptical would be accepted anyhow.

I used following script to generate the ROA JSON:

require "ipaddr"
class RoaCacheGen
  def gen(network, filename)
    rnd, file = Random.new(42), File.open(filename, "w")
    file.puts '{\n  "roas": ['
    IPAddr.new(network).to_range.each do |net|
      file.puts '    { "asn": "AS%d", "prefix": "%s/32", "maxLength": 32, "ta": "ripe" },' % [ rnd.rand(64495)+1, net.to_s ]
    end
    file.seek(-2, IO::SEEK_END)
    file.puts "\n  ]\n}"
    file.close
  end
end
RoaCacheGen.new.gen(ARGV[0], ARGV[1])
lspgn commented 5 years ago

Thanks @ytti ! Yes an optimization in the diff was overdue, thanks for doing it, I'm taking a look at the commits right now.

Could you expand on "The change does not reflect how I'd prefer to do this, but preferred solution would require significant rafactoring which I'm skeptical would be accepted anyhow" ?

ytti commented 5 years ago

The preferred way I would have done this, would have been to create new container type for ROAs, which would be Set type with some ROA specific methods as well as of course intersection and difference methods. Probably something along the lines of https://godoc.org/github.com/deckarep/golang-set if not verbatim.

I am also bit confused why do we need to iterate the ROAs in ComputeDiff and again in ApplyDiff, to me it looks like the work as already done, we already know what is the new set for ROAs, so do the work again in ApplyDiff seems unnecessary. I may be missing something, but even if it is necessary, it seems wet to implement it twice, instead of just call same method on the container.

lspgn commented 5 years ago

I think something along the lines of: type []ROA ROASet and func (s *ROAset) Diff(s2 *ROASet) could be a solution. The apply diff is for keeping previous serials for clients. While the compute diff is just the current diff iteration.

ytti commented 5 years ago

That makes sense (diff roles), I should have spent more time on it.

lspgn commented 5 years ago

@ytti If that's okay, I will merge my PR (#33) which solves the issue in a similar way. I've been testing it on rtr.rpki.cloudflare.com and parts of our network: it showed good results. Thank you!

ytti commented 5 years ago

No complaints, thanks!