bgp / stayrtr

RPKI-To-Router server implementation in Go
BSD 3-Clause "New" or "Revised" License
85 stars 13 forks source link

Optimize VRP struct sizes. Use netip.Prefix instead of net.IPNet. #103

Closed floatingstatic closed 6 months ago

floatingstatic commented 7 months ago

I'm evaluating stayrtr in a memory constrained environment. The memory footprint of stayrtr could be optimized a bit. It seems the bulk of the memory is allocated during updates but there are also some things that could be modified to reduce memory usage during steady state. To do this I'm proposing the following changes:

Current:

$ structlayout . VRP
VRP.Prefix.IP net.IP: 0-24 (size 24, align 8)
VRP.Prefix.Mask net.IPMask: 24-48 (size 24, align 8)
VRP.MaxLen uint8: 48-49 (size 1, align 1)
padding: 49-52 (size 3, align 0)
VRP.ASN uint32: 52-56 (size 4, align 4)
VRP.Flags uint8: 56-57 (size 1, align 1)
padding: 57-64 (size 7, align 0)

New:

$ structlayout  . VRP
VRP.Prefix.ip.addr.hi uint64: 0-8 (size 8, align 8)
VRP.Prefix.ip.addr.lo uint64: 8-16 (size 8, align 8)
VRP.Prefix.ip.z *internal/intern.Value: 16-24 (size 8, align 8)
VRP.Prefix.bitsPlusOne uint8: 24-25 (size 1, align 1)
padding: 25-32 (size 7, align 0)
VRP.ASN uint32: 32-36 (size 4, align 4)
VRP.MaxLen uint8: 36-37 (size 1, align 1)
VRP.Flags uint8: 37-38 (size 1, align 1)
padding: 38-40 (size 2, align 0)

Current:

$ structlayout . PDUIPv4Prefix
PDUIPv4Prefix.Version uint8: 0-1 (size 1, align 1)
padding: 1-8 (size 7, align 0)
PDUIPv4Prefix.Prefix.IP net.IP: 8-32 (size 24, align 8)
PDUIPv4Prefix.Prefix.Mask net.IPMask: 32-56 (size 24, align 8)
PDUIPv4Prefix.MaxLen uint8: 56-57 (size 1, align 1)
padding: 57-60 (size 3, align 0)
PDUIPv4Prefix.ASN uint32: 60-64 (size 4, align 4)
PDUIPv4Prefix.Flags uint8: 64-65 (size 1, align 1)
padding: 65-72 (size 7, align 0)

$ structlayout . PDUIPv6Prefix
PDUIPv6Prefix.Version uint8: 0-1 (size 1, align 1)
padding: 1-8 (size 7, align 0)
PDUIPv6Prefix.Prefix.IP net.IP: 8-32 (size 24, align 8)
PDUIPv6Prefix.Prefix.Mask net.IPMask: 32-56 (size 24, align 8)
PDUIPv6Prefix.MaxLen uint8: 56-57 (size 1, align 1)
padding: 57-60 (size 3, align 0)
PDUIPv6Prefix.ASN uint32: 60-64 (size 4, align 4)
PDUIPv6Prefix.Flags uint8: 64-65 (size 1, align 1)
padding: 65-72 (size 7, align 0)

New:

$ structlayout . PDUIPv4Prefix
PDUIPv4Prefix.Prefix.ip.addr.hi uint64: 0-8 (size 8, align 8)
PDUIPv4Prefix.Prefix.ip.addr.lo uint64: 8-16 (size 8, align 8)
PDUIPv4Prefix.Prefix.ip.z *internal/intern.Value: 16-24 (size 8, align 8)
PDUIPv4Prefix.Prefix.bitsPlusOne uint8: 24-25 (size 1, align 1)
padding: 25-32 (size 7, align 0)
PDUIPv4Prefix.ASN uint32: 32-36 (size 4, align 4)
PDUIPv4Prefix.Version uint8: 36-37 (size 1, align 1)
PDUIPv4Prefix.MaxLen uint8: 37-38 (size 1, align 1)
PDUIPv4Prefix.Flags uint8: 38-39 (size 1, align 1)
padding: 39-40 (size 1, align 0)

$ structlayout . PDUIPv6Prefix
PDUIPv6Prefix.Prefix.ip.addr.hi uint64: 0-8 (size 8, align 8)
PDUIPv6Prefix.Prefix.ip.addr.lo uint64: 8-16 (size 8, align 8)
PDUIPv6Prefix.Prefix.ip.z *internal/intern.Value: 16-24 (size 8, align 8)
PDUIPv6Prefix.Prefix.bitsPlusOne uint8: 24-25 (size 1, align 1)
padding: 25-32 (size 7, align 0)
PDUIPv6Prefix.ASN uint32: 32-36 (size 4, align 4)
PDUIPv6Prefix.Version uint8: 36-37 (size 1, align 1)
PDUIPv6Prefix.MaxLen uint8: 37-38 (size 1, align 1)
PDUIPv6Prefix.Flags uint8: 38-39 (size 1, align 1)
padding: 39-40 (size 1, align 0)

Doing a simple side-by-side test with stayrtr started at roughly the same time on two hosts we can compare current master branch (yellow) memory RSS vs my forked branch (green):

image

For what its worth it seems most of the memory allocations occurs when updating VRPs. It appears the bulk of this is related to copying data which I haven't found a clean workaround for but may be something that could be addressed in a future pr. A view of this from pprof showing lots of this from VRP.Copy()

image

Beyond ensuring all test cases pass I have also run integration tests with BIRD 2.14 and confirm that RTR functionality continues to function as before with these changes.

benjojo commented 7 months ago

On the surface this looks fine, I would however want to see if two stayrtr's (old and this PR) running head to head for a few days with rtrmon comparing the two, just to make sure :)

ties commented 7 months ago

Very nice analysis and a clear change!

On the surface this looks fine, I would however want to see if two stayrtr's (old and this PR) running head to head for a few days with rtrmon comparing the two, just to make sure :)

That is probably the best way to be sure it is functionally identical. This way you can check that the VRPs converge after a set delay. The relevant metric is vrp_diff with a threshold that is high enough for stayrtr-current and stayrtr-thispr to update.

floatingstatic commented 7 months ago

For clarification, is this an experiment you need me to run or are you doing this?

ties commented 7 months ago

For clarification, is this an experiment you need me to run or are you doing this?

I would prefer it if you ran it (I likely won't have time until 2024). Do you have a working setup with prometheus?

If we agree that convergence should be within 256s, I would check the history for violations of that property

max_over_time(vrp_diff{visibility_seconds="256"}[1h]) > 0
floatingstatic commented 7 months ago

Ok hopefully I did this right, I ran something like this on a linux host with binaries built from my branch:

./stayrtr -bind 127.0.0.1:8282 -cache https://console.rpki-client.org/vrps.json -metrics.addr 127.0.0.1:9847
./rtrmon -addr 0.0.0.0:9866 -primary.host tcp://127.0.0.1:8282 -secondary.host https://console.rpki-client.org/vrps.json

Output looks like the below in prom. I do not see anything with label visibility_seconds="256", not sure if unexpected:

image

For whats its worth running a brief test using binaries built from upstream looks the same. Anyway I will leave this running for a couple of days and report back. Thanks!

ties commented 7 months ago

Ok hopefully I did this right, I ran something like this on a linux host with binaries built from my branch:

./stayrtr -bind 127.0.0.1:8282 -cache https://console.rpki-client.org/vrps.json -metrics.addr 127.0.0.1:9847
./rtrmon -addr 0.0.0.0:9866 -primary.host tcp://127.0.0.1:8282 -secondary.host https://console.rpki-client.org/vrps.json

Output looks like the below in prom. I do not see anything with label visibility_seconds="256", not sure if unexpected:

Those labels should be there eventually. Let's see!

cjeker commented 7 months ago

This not only blocks #105 but also additional work to make the delta handling for ASPA correct. It would be nice if this could get some priority to unblock this work.

floatingstatic commented 7 months ago

@ties I have had 2 instances of stayrtr running along with 2 instances of rtrmon in my local dev environment for a couple of days. There are 2 jobs, rtrmon and rtrmon-upstream which cover binaries from my forked branch and current upstream respectively. They were not started at exactly the same time so there are some minor deltas. The output looks as follows:

image

I'm not sure exactly what you are looking for here, happy to share any additional views you may want to see here.

ties commented 7 months ago

@ties I have had 2 instances of stayrtr running along with 2 instances of rtrmon in my local dev environment for a couple of days. There are 2 jobs, rtrmon and rtrmon-upstream which cover binaries from my forked branch and current upstream respectively. They were not started at exactly the same time so there are some minor deltas. The output looks as follows:

Thanks for doing this!

I'm not sure exactly what you are looking for here, happy to share any additional views you may want to see here.

I hoped to see a flat line, as in "the worst case divergence converges within 256s". Can you try the following case?

Alternatively, rtrmon can compare two stayrtrs, but reading the same JSON should also work if the JSON is from one source of truth.

ties commented 7 months ago

I just realised the default refresh interval is 600s; in that case, 851 or 1024 are the first values that are likely to converge.

I am also collecting data myself now, I should be able to give an update tomorrow.

floatingstatic commented 7 months ago

I'm starting to wonder if this isn't the best way to check this or if I am doing something wrong. Yes, I am using default timers for everything but both upstream and my fork are showing a non-zero delta so either the testing methodology is incorrect or I'm doing something wrong here:

image

image

for what its worth the upstream deltas look "worse" to my eye.

So it sounds like you suggest perhaps running two stayrtr's (fork and master branch) and configure rtrmon to diff those two instead? I can give that a try and report back.

job commented 7 months ago

So it sounds like you suggest perhaps running two stayrtr's (fork and master branch) and configure rtrmon to diff those two instead? I can give that a try and report back.

that seems helpful indeed - before merging

ties commented 6 months ago

I'm starting to wonder if this isn't the best way to check this or if I am doing something wrong. Yes, I am using default timers for everything but both upstream and my fork are showing a non-zero delta so either the testing methodology is incorrect or I'm doing something wrong here:

Or the system does not converge as fast as I thought it would. I mostly run these tools in a monitoring role with (much) lower timers than you would want to run in real life.

So it sounds like you suggest perhaps running two stayrtr's (fork and master branch) and configure rtrmon to diff those two instead? I can give that a try and report back.

That would be the comparative test that I think will work (I'm writing this before I looked at my data). For data collection I used this setup:

# build the main branch + this PR and tag them, i.e.
# docker build . -t stayrtr-structoptimize --target stayrtr
services:
  stayrtr-master:
    image: stayrtr-master
    command:
      - -refresh
      - "60"
      - -cache
      - https://rpki-validator.ripe.net/json
  stayrtr-structoptimize:
    image: stayrtr-structoptimize
    command:
      - -refresh
      - "60"
      - -cache
      - https://rpki-validator.ripe.net/json
      # - https://console.rpki-client.org/rpki.json
  rtrmon:
    image: rpki/rtrmon
    restart: unless-stopped
    ports:
      - 9866:9866
    command:
      - -primary.host
      - tcp://stayrtr-master:8282
      - -secondary.host
      - tcp://stayrtr-structoptimize:8282

My reason for picking rpki-validator.ripe.net/json is that it has a very frequent update interval.

Now, looking at the data, we see that 1024s was converged between the two instances at all times:

Screenshot 2023-12-21 at 08 35 37

If we look at all the points in time, there were tiny differences that recovered (more aggressive timers might help there).

Screenshot 2023-12-21 at 08 36 09

I forgot to capture the metrics directly from stayrtr so I do not know the comparative memory usage. But I trust your statistics on that.

It looks good to me. LGTM!