farinacci / lispers.net

lispers.net code for the world's most feature-rich implementation of the Locator/ID Separation Protocol (LISP)
Apache License 2.0
34 stars 3 forks source link

Approximate Relative performance of Python and go Data planes. #28

Open Slime90 opened 10 months ago

Slime90 commented 10 months ago

Hello,

First off, thanks so much for open sourcing this awesome implementation!

Can you provide some general idea of the relative performance difference for encap/forwarding while leveraging the Python and Go Data planes respectively? Is the Go data-plane still lacking support for nat-t? I think it would be helpful to have a doc showing a feature support matrix comparing the two as well as some basic comparisons of data-plane performance. I do realize that this is highly dependent on hardware, kernel version, and other factors, and I will be doing my own comparisons in the future, but having some baseline context before testing would be hugely helpful!

farinacci commented 10 months ago

Hello, First off, thanks so much for open sourcing this awesome implementation!

Thank you.

Can you provide some general idea of the relative performance difference for encap/forwarding while leveraging the Python

I can't since I have not measured it.

and Go Data planes respectively? Is the Go data-plane still lacking support for nat-t? I think it would be helpful to have a

The Go data-plane does do NAT-traversal, meaning an xTR behind a NAT can run the Go data-plane as well as the RTR in public space to encap to a ephemeral translated port so packets get through the NAT to the xTR.

doc showing a feature support matrix comparing the two as well as some basic comparisons of data-plane performance. I do

I'll add this to my todo list. But if you are perusing the code and want to start one, I can review it and we can put it in the repo.

realize that this is highly dependent on hardware, kernel version, and other factors, and I will be doing my own comparisons in the future, but having some baseline context before testing would be hugely helpful!

Right, but all the lispers.net http://lispers.net/ forwarding plane is done in user space. Also note, there is a fast python data plane that doesn't use any python libraries and manipulates fields in the data header with brute-force byte operation.

Dino

Slime90 commented 10 months ago

I can't since I have not measured it.

I will put a project on my list to test the various dataplanes on the same hardware, hopefully in the next month or two

Right, but all the lispers.net http://lispers.net/ forwarding plane is done in user space. Also note, there is a fast python data plane that doesn't use any python libraries and manipulates fields in the data header with brute-force byte operation

ok, is that

# Faster python data-plane.
#
#setenv LISP_RTR_FAST_DATA_PLANE
#setenv LISP_RTR_LATENCY_DEBUG

From RUN-LISP?

What are the downsides of this "faster python data-plane" vs the original?

Thanks!

farinacci commented 10 months ago

I can't since I have not measured it. I will put a project on my list to test the various dataplanes on the same hardware, hopefully in the next month or two

That would be great. I am here to support you and if we need to make tweeks to make things go faster, I can code them up for you.

Right, but all the lispers.net http://lispers.net/ forwarding plane is done in user space. Also note, there is a fast python data plane that doesn't use any python libraries and manipulates fields in the data header with brute-force byte operation ok, is that

Faster python data-plane.

#

setenv LISP_RTR_FAST_DATA_PLANE

setenv LISP_RTR_LATENCY_DEBUG

From RUN-LISP?

Yes.

What are the downsides of this "faster python data-plane" vs the original?

Less features.

Dino

Slime90 commented 6 months ago

I finally got around to starting to put together a docker-compose setup to do some relative iperf bench-marking of the different dataplanes. I am running into a few issues right off the bat with the default and fast python dataplanes:

  1. Latency is consistently VERY high. RLOCs are on the same linux bridge on the same host, and yet there is on average well over 100ms of RTT for ICMP. The below is a ping between connected EID addresses on 2 docker xTRS with RLOCS on the same host bridge, with the source set in the default route as recommended in the RL script comments.
root@a17f6bd0e459:/lispers.net# ping 10.2.0.254
PING 10.2.0.254 (10.2.0.254) 56(84) bytes of data.
64 bytes from 10.2.0.254: icmp_seq=4 ttl=62 time=193 ms
64 bytes from 10.2.0.254: icmp_seq=5 ttl=62 time=231 ms
64 bytes from 10.2.0.254: icmp_seq=6 ttl=62 time=167 ms
64 bytes from 10.2.0.254: icmp_seq=7 ttl=62 time=205 ms
64 bytes from 10.2.0.254: icmp_seq=8 ttl=62 time=243 ms
64 bytes from 10.2.0.254: icmp_seq=9 ttl=62 time=179 ms
64 bytes from 10.2.0.254: icmp_seq=10 ttl=62 time=217 ms
64 bytes from 10.2.0.254: icmp_seq=11 ttl=62 time=152 ms
64 bytes from 10.2.0.254: icmp_seq=12 ttl=62 time=191 ms
64 bytes from 10.2.0.254: icmp_seq=13 ttl=62 time=231 ms
64 bytes from 10.2.0.254: icmp_seq=14 ttl=62 time=165 ms
64 bytes from 10.2.0.254: icmp_seq=15 ttl=62 time=204 ms
  1. I enabled RLOC probing to try to get some internal stats on what LISP thinks the latency between systems is in the controlplane vs the latency observed with ICMP. After 10 or 15 seconds the RLOC gets marked unreachable:
root@a17f6bd0e459:/lispers.net# ./mc

LISP Map-Cache for localhost:8080, hostname a17f6bd0e459, release 0.606

EID [0]10.2.0.0/24, uptime 0:00:19, ttl 1440m
  RLOC 192.168.168.2, state unreach-state since 0:00:03
    packet-count: 13, packet-rate: 0 pps, byte-count: 1092, bit-rate: 0.0 mbps
    rtts [-1, -1, -1], hops [?/?, ?/?, ?/?], latencies [?/?, ?/?, ?/?]

The logs are also spammed with the following exception:

---------- Exception occurred: 03/20/24 03:39:12.374 ----------
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 315, in 'calling callback function'
  File "/usr/local/lib/python2.7/dist-packages/pcappy/__init__.py", line 549, in _loop_callback
    callback(user.contents.value, ph, pd)
  File "lisp-itr.py", line 940, in O0oOo
  File "lisp-itr.py", line 590, in ii
  File "lisp.py", line 7170, in lisp_parse_packet
  File "lisp.py", line 9037, in lisp_process_map_reply
  File "lisp.py", line 17969, in lisp_process_rloc_probe_reply
AttributeError: 'NoneType' object has no attribute 'find'

Any guidance would be appreciated!

Thanks.

Slime90 commented 5 months ago

@farinacci I know you are likely extremely busy, but wanted to follow up on this and get your thoughts on if this latency reported above is expected, and if the probing issue is a bug or user error

Thanks!

farinacci commented 5 months ago

I don’t have an update. I have not done any measurements. If you would like to setup a test bed to try your own measurements, I can help you configure the boxes and suggest some test cases. We have enough logging to print latency numbers. If there are any code changes required to give you what you need, I can do that for you. Thanks,DinoOn Apr 28, 2024, at 12:47 PM, Slime90 @.***> wrote: @farinacci I know you are likely extremely busy, but wanted to follow up on this and get your thoughts on if this latency reported above is expected, and if the probing issue is a bug or user error Thanks!

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

Slime90 commented 5 months ago

Thank you.

Perhaps we can discuss this via email, I have some other questions for you too that go beyond the scope of this specific issue. I will shoot you a message to your address listed in your git profile.

Thanks!