eclipse-iceoryx / iceoryx

Eclipse iceoryx™ - true zero-copy inter-process-communication
https://iceoryx.io
Apache License 2.0
1.67k stars 391 forks source link

Slow performance on yocto builds #549

Closed Indra5196 closed 3 years ago

Indra5196 commented 3 years ago

Required information

Operating system: Linux version 4.18.33-yocto-standard

Compiler version: GCC 8.2.0

iceoryx version 0.17.0.2

Observed result or behaviour: I ran Iceperf application in my yocto build. Performance of Iceoryx is very slow (~3ms) as compared to MQ or UDS (~0.2ms for max size data packets). Payload Size [kB] Average Latency [µs]
1 2e+03
2 1.9e+03
4 1.9e+03
8 1.9e+03
16 2e+03

Yocto image specs: BB_VERSION = "1.40.0" BUILD_SYS = "x86_64-linux" NATIVELSBSTRING = "universal" TARGET_SYS = "i586-poky-linux" MACHINE = "qemux86" DISTRO = "poky" DISTRO_VERSION = "2.6.4" TUNE_FEATURES = "m32 i586" TARGET_FPU = ""

Expected result or behaviour: Performance of Iceoryx on my Ubuntu 18.04 LTS machine (Intel core i3 7th gen) is approximately 5 microseconds and I expected it to be atleast faster than MQ/UDS on Yocto

Does anyone know the cause and possible resolution for the same

mossmaurice commented 3 years ago

Thanks for opening the issue @Indra5196

You are writing MACHINE = "qemux86". Does this mean the yocto image and your setup was running virtualised inside qemu?

Indra5196 commented 3 years ago

Yes

elfenpiff commented 3 years ago

@Indra5196 this could have multiple reasons. First of all please be aware that we support 32 bit system but do not optimize for them!

  1. When we acquire shared memory it is usually stored in the main memory but if the memory has insufficient size and a swap partition is enabled then the shared memory is allocated on your hard drive/sd card (wherever the swap partition is stored). Do you have swap enabled? And can you maybe share the output of df -h and cat /proc/meminfo once before your system is started and once roudi is started and the communication is ongoing. Also see this here: https://www.halolinux.us/kernel-reference/ipc-shared-memory.html

  2. Could you please provide the whole benchmark results. I am interested if the performance is also low for tiny data packages. If the performance drops at a certain size it would support the swapping theory.

  3. How much main memory does the board have?

Indra5196 commented 3 years ago

Here are the stats of df -h and cat/proc/meminfo of my QEMU instance BEFORE runninng RouDi: cat_proc_meminfo_before_roudi df_h_before_roudi

Here are the stats of df -h and cat/proc/meminfo of my QEMU instance AFTER runninng RouDi: cat_proc_meminfo_after_roudi df_h_after_roudi

MQ performance: Payload Size [kB] Average Latency [µs]
1 1.1e+02
2 1.1e+02
4 1.2e+02
8 1.7e+02
16 2.8e+02
32 4.8e+02
64 9.6e+02
128 1.5e+03
256 3.1e+03

UDS Performance:

Payload Size [kB] Average Latency [µs]
1 2e+02
2 2e+02
4 2.1e+02
8 3.8e+02
16 6.8e+02
32 1.3e+03
64 2.9e+03
128 3.6e+03
256 5.9e+03

Iceoryx Performance:

Payload Size [kB] Average Latency [µs]
1 3.7e+03
2 3.7e+03
4 3.7e+03
8 3.7e+03
16 3.7e+03
32 3.7e+03
64 3.7e+03
128 3.8e+03
256 3.7e+03

Due to memory limitations of QEMU instance and to save some time, I only tested up till 256KB packets for 1000 iterations I am using one shared memory segment with 100 chunks of size 256KB for this test

mossmaurice commented 3 years ago

Thanks for providing the numbers. I'd like to understand your use-case better. Is the QEMU environment your target system or you want to use it for development? Have you tried booting the Yocto imagine natively on your board and re-run iceperf?

Indra5196 commented 3 years ago

@mossmaurice We want to run it on both QEMU and RaspberryPi 3. Once its running fine on QEMU, we will try to run it on RaspberryPi

elfenpiff commented 3 years ago

@Indra5196 Could you please build everything first with cmake -Bbuild -Hiceoryx_meta -DCMAKE_BUILD_TYPE=Release otherwise you enable the debug flags and then it is clear why it is so much slower.

Additionally, I thought there was somehow somewhere a bug that even when -DCMAKE_BUILD_TYPE=Release was enabled the examples where still being build with debug flags. Therefore, could you please checkout our current 0.9 release or master and perform the benchmarks again and -DCMAKE_BUILD_TYPE=Release.

But maybe it is a QEMU issue?! Could you please try the following: Run the performance example on your target hardware (Raspberry Pi 3 as far as I understand) with a current Raspberry Pi OS and with your Yocto image.

Indra5196 commented 3 years ago

Hi @elfenpiff,

Just FYI, I tried building a 64-bit Image in release mode (previously i was using debug mode). But I saw no performance improvement. Since you've already tested it on R-Pi, I hope its a QEMU only issue. Will soon test on R-Pi also

pabloEnzes2 commented 3 years ago

master debug PI(last updates installed):

./iceoryx_examples/iceperf/iceperf-laurel

******   MESSAGE QUEUE    ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,

#### Measurement Result ####
10000 round trips for each payload.

| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
|                 1 |                   36 |
|                 2 |                   37 |
|                 4 |                   45 |
|                 8 |                   60 |
|                16 |                   89 |
|                32 |              1.4e+02 |
|                64 |              2.5e+02 |
|               128 |              4.7e+02 |
|               256 |              9.1e+02 |
|               512 |              1.8e+03 |
|              1024 |              3.6e+03 |
|              2048 |              7.1e+03 |
|              4096 |              1.4e+04 |

Finished!

****** UNIX DOMAIN SOCKET ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,

#### Measurement Result ####
10000 round trips for each payload.

| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
|                 1 |                   60 |
|                 2 |                   60 |
|                 4 |                   64 |
|                 8 |                   83 |
|                16 |              1.2e+02 |
|                32 |              1.9e+02 |
|                64 |              3.5e+02 |
|               128 |              6.7e+02 |
|               256 |              1.3e+03 |
|               512 |              2.6e+03 |
|              1024 |              5.3e+03 |
|              2048 |                1e+04 |
|              4096 |              1.9e+04 |

Finished!
2021-02-13 20:13:31.236 [ Debug ]: Application registered management segment 0x72d51000 with size 64440704 to id 1
2021-02-13 20:13:31.241 [ Info  ]: Application registered payload segment 0x69f17000 with size 149134400 to id 2

******      ICEORYX       ********
Waiting for: subscription, subscriber [ success ]
Measurement for: 1 kB, 2021-02-13 20:13:31.324 [ Error ]: ICEORYX error! POSH__MEMPOOL_POSSIBLE_DOUBLE_FREE
iceperf-laurel: /home/pi/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:56: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted

debug PI tried again:

pi@raspberrypi:~/taps $ ./iceoryx_examples/iceperf/iceperf-hardy

******   MESSAGE QUEUE    ********
registering with the leader, if no leader this will crash with a message queue error now

****** UNIX DOMAIN SOCKET ********
registering with the leader, if no leader this will crash with a socket error now
2021-02-13 19:33:01.382 [ Debug ]: Application registered management segment 0x72dc6000 with size 64440704 to id 1
2021-02-13 19:33:01.387 [ Info  ]: Application registered payload segment 0x69f8c000 with size 149134400 to id 2

******      ICEORYX       ********
Waiting for: subscription, subscriber [ success ]
2021-02-13 19:33:01.467 [ Error ]: ICEORYX error! POSH__MEMPOOL_POSSIBLE_DOUBLE_FREE
iceperf-hardy: /home/pi/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:56: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted

PI info:

pi@raspberrypi:~/Downloads $ cat /etc/debian_version
10.8
pi@raspberrypi:~/Downloads $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@raspberrypi:~/Downloads $ uname -a
Linux raspberrypi 5.10.11-v7+ #1399 SMP Thu Jan 28 12:06:05 GMT 2021 armv7l GNU/Linux
pi@raspberrypi:~/Downloads $ cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 38.40
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 1
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 38.40
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 2
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 38.40
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 3
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 38.40
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

Hardware        : BCM2835
Revision        : a01041
Serial          : 00000000c13b61ba
Model           : Raspberry Pi 2 Model B Rev 1.1
pi@raspberrypi:~/Downloads $ gcc --version
gcc (Raspbian 8.3.0-6+rpi1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

master debug cubox:

pi@cubox-i:~/taps$ ./iceoryx_examples/iceperf/iceperf-laurel

******   MESSAGE QUEUE    ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,

#### Measurement Result ####
10000 round trips for each payload.

| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
|                 1 |                   60 |
|                 2 |                   69 |
|                 4 |                   80 |
|                 8 |                1e+02 |
|                16 |              1.1e+02 |
|                32 |              1.9e+02 |
|                64 |              2.9e+02 |
|               128 |              5.4e+02 |
|               256 |                1e+03 |
|               512 |              2.1e+03 |
|              1024 |              4.1e+03 |
|              2048 |                8e+03 |
|              4096 |              1.6e+04 |

Finished!

****** UNIX DOMAIN SOCKET ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,

#### Measurement Result ####
10000 round trips for each payload.

| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
|                 1 |                   95 |
|                 2 |              1.1e+02 |
|                 4 |              1.1e+02 |
|                 8 |              1.8e+02 |
|                16 |              2.9e+02 |
|                32 |              4.1e+02 |
|                64 |              7.6e+02 |
|               128 |              1.5e+03 |
|               256 |              2.9e+03 |
|               512 |              5.8e+03 |
|              1024 |              1.1e+04 |
|              2048 |              2.3e+04 |
|              4096 |              4.5e+04 |

Finished!
2021-02-14 01:15:01.089 [ Debug ]: Application registered management segment 0xffffffffb2e99000 with size 64440704 to id 1
2021-02-14 01:15:01.093 [ Info  ]: Application registered payload segment 0xffffffffaa05f000 with size 149134400 to id 2

******      ICEORYX       ********
Waiting for: subscription, subscriber [ success ]
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
Waiting for: unsubscribe  [ finished ]

#### Measurement Result ####
10000 round trips for each payload.

| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
|                 1 |                   45 |
|                 2 |                   43 |
|                 4 |                   43 |
|                 8 |                   43 |
|                16 |                   44 |
|                32 |                   43 |
|                64 |                   43 |
|               128 |                   43 |
|               256 |                   43 |
|               512 |                   43 |
|              1024 |                   43 |
|              2048 |                   43 |
|              4096 |                   43 |

Finished!

******   ICEORYX C API    ********
Waiting for: subscription, subscriber [ success ]
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
Waiting for: unsubscribe  [ finished ]

#### Measurement Result ####
10000 round trips for each payload.

| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
|                 1 |                   37 |
|                 2 |                   37 |
|                 4 |                   37 |
|                 8 |                   37 |
|                16 |                   37 |
|                32 |                   37 |
|                64 |                   37 |
|               128 |                   37 |
|               256 |                   37 |
|               512 |                   37 |
|              1024 |                   37 |
|              2048 |                   37 |
|              4096 |                   37 |

Finished!

cubox info:

pi@cubox-i:~$ cat /etc/debian_version
bullseye/sid
pi@cubox-i:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Armbian 21.02.1 Focal"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
pi@cubox-i:~$ uname -a
Linux cubox-i 5.10.12-imx6 #21.02.1 SMP Wed Feb 3 21:02:35 CET 2021 armv7l armv7l armv7l GNU/Linux
pi@cubox-i:~$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 7.54
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

processor       : 1
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 7.54
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

processor       : 2
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 7.54
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

processor       : 3
model name      : ARMv7 Processor rev 10 (v7l)
BogoMIPS        : 7.54
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x2
CPU part        : 0xc09
CPU revision    : 10

Hardware        : Freescale i.MX6 Quad/DualLite (Device Tree)
Revision        : 0000
Serial          : 0000000000000000
pi@cubox-i:~$ arch
armv7l
pi@cubox-i:~$ file /sbin/init
/sbin/init: symbolic link to /lib/systemd/systemd
pi@cubox-i:~$ file /lib/systemd/systemd
/lib/systemd/systemd: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=fb6445f8823882b0fa14a41dcad258ebf7b7555f, for GNU/Linux 3.2.0, stripped
pi@cubox-i:~$ lscpu
Architecture:        armv7l
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               10
Model name:          Cortex-A9
Stepping:            r2p10
CPU max MHz:         996.0000
CPU min MHz:         396.0000
BogoMIPS:            7.54
Flags:               half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
pi@cubox-i:~$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
elfenpiff commented 3 years ago

@pabloEnzes2 @Indra5196

It seems that the iceperf benchmark is not running on Raspberry Pi OS January 11th 2021 (32-bit). I verified this with an Raspberry Pi 3b+. To track this issue I created #562 but at the moment we do not support 32-bit systems and it looks like we will not support them in the near future.

But nevertheless the iceoryx examples are running and the code compiles with a lot of warnings.

But if one of you would like to face the challenge to get this completely running again we would support you via https://gitter.im/eclipse/iceoryx in the endeavor.

pabloEnzes2 commented 3 years ago

@elfenpiff What happened to:

First of all please be aware that we support 32 bit system but do not optimize for them!

elfenpiff commented 3 years ago

@elfenpiff What happened to:

First of all please be aware that we support 32 bit system but do not optimize for them!

@pabloEnzes2 I was mistaken and a colleague corrected me. I am sorry for the confusion! One year ago I implemented the 32-bit support and at the moment it seems like its working. If you encounter any problems please create an issue and we try to support you but we will not actively work on it those issues.

mossmaurice commented 3 years ago

@Indra5196 I've documented the 64-bit requirement and added a warning on 32-bit systems. Have you tried to run iceperf inside a 64-bit Linux image on QEMU? Feel free to re-open this issue.