Closed Indra5196 closed 3 years ago
Thanks for opening the issue @Indra5196
You are writing MACHINE = "qemux86". Does this mean the yocto image and your setup was running virtualised inside qemu?
Yes
@Indra5196 this could have multiple reasons. First of all please be aware that we support 32 bit system but do not optimize for them!
When we acquire shared memory it is usually stored in the main memory but if the memory has insufficient size and a swap partition is enabled then the shared memory is allocated on your hard drive/sd card (wherever the swap partition is stored). Do you have swap enabled? And can you maybe share the output of df -h
and cat /proc/meminfo
once before your system is started and once roudi is started and the communication is ongoing.
Also see this here: https://www.halolinux.us/kernel-reference/ipc-shared-memory.html
Could you please provide the whole benchmark results. I am interested if the performance is also low for tiny data packages. If the performance drops at a certain size it would support the swapping theory.
How much main memory does the board have?
Here are the stats of df -h
and cat/proc/meminfo
of my QEMU instance BEFORE runninng RouDi:
Here are the stats of df -h
and cat/proc/meminfo
of my QEMU instance AFTER runninng RouDi:
MQ performance: | Payload Size [kB] | Average Latency [µs] |
---|---|---|
1 | 1.1e+02 | |
2 | 1.1e+02 | |
4 | 1.2e+02 | |
8 | 1.7e+02 | |
16 | 2.8e+02 | |
32 | 4.8e+02 | |
64 | 9.6e+02 | |
128 | 1.5e+03 | |
256 | 3.1e+03 |
UDS Performance:
Payload Size [kB] | Average Latency [µs] |
---|---|
1 | 2e+02 |
2 | 2e+02 |
4 | 2.1e+02 |
8 | 3.8e+02 |
16 | 6.8e+02 |
32 | 1.3e+03 |
64 | 2.9e+03 |
128 | 3.6e+03 |
256 | 5.9e+03 |
Iceoryx Performance:
Payload Size [kB] | Average Latency [µs] |
---|---|
1 | 3.7e+03 |
2 | 3.7e+03 |
4 | 3.7e+03 |
8 | 3.7e+03 |
16 | 3.7e+03 |
32 | 3.7e+03 |
64 | 3.7e+03 |
128 | 3.8e+03 |
256 | 3.7e+03 |
Due to memory limitations of QEMU instance and to save some time, I only tested up till 256KB packets for 1000 iterations I am using one shared memory segment with 100 chunks of size 256KB for this test
Thanks for providing the numbers. I'd like to understand your use-case better. Is the QEMU environment your target system or you want to use it for development? Have you tried booting the Yocto imagine natively on your board and re-run iceperf?
@mossmaurice We want to run it on both QEMU and RaspberryPi 3. Once its running fine on QEMU, we will try to run it on RaspberryPi
@Indra5196 Could you please build everything first with
cmake -Bbuild -Hiceoryx_meta -DCMAKE_BUILD_TYPE=Release
otherwise you enable the debug flags and then it is clear why it is so much slower.
Additionally, I thought there was somehow somewhere a bug that even when -DCMAKE_BUILD_TYPE=Release
was enabled the examples where still being build with debug flags. Therefore, could you please checkout our current 0.9 release or master and perform the benchmarks again and -DCMAKE_BUILD_TYPE=Release
.
-DCMAKE_BUILD_TYPE=Release
solves your issue then the debug flags caused the problem.But maybe it is a QEMU issue?! Could you please try the following: Run the performance example on your target hardware (Raspberry Pi 3 as far as I understand) with a current Raspberry Pi OS and with your Yocto image.
Hi @elfenpiff,
Just FYI, I tried building a 64-bit Image in release mode (previously i was using debug mode). But I saw no performance improvement. Since you've already tested it on R-Pi, I hope its a QEMU only issue. Will soon test on R-Pi also
master debug PI(last updates installed):
./iceoryx_examples/iceperf/iceperf-laurel
****** MESSAGE QUEUE ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
#### Measurement Result ####
10000 round trips for each payload.
| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
| 1 | 36 |
| 2 | 37 |
| 4 | 45 |
| 8 | 60 |
| 16 | 89 |
| 32 | 1.4e+02 |
| 64 | 2.5e+02 |
| 128 | 4.7e+02 |
| 256 | 9.1e+02 |
| 512 | 1.8e+03 |
| 1024 | 3.6e+03 |
| 2048 | 7.1e+03 |
| 4096 | 1.4e+04 |
Finished!
****** UNIX DOMAIN SOCKET ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
#### Measurement Result ####
10000 round trips for each payload.
| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
| 1 | 60 |
| 2 | 60 |
| 4 | 64 |
| 8 | 83 |
| 16 | 1.2e+02 |
| 32 | 1.9e+02 |
| 64 | 3.5e+02 |
| 128 | 6.7e+02 |
| 256 | 1.3e+03 |
| 512 | 2.6e+03 |
| 1024 | 5.3e+03 |
| 2048 | 1e+04 |
| 4096 | 1.9e+04 |
Finished!
2021-02-13 20:13:31.236 [ Debug ]: Application registered management segment 0x72d51000 with size 64440704 to id 1
2021-02-13 20:13:31.241 [ Info ]: Application registered payload segment 0x69f17000 with size 149134400 to id 2
****** ICEORYX ********
Waiting for: subscription, subscriber [ success ]
Measurement for: 1 kB, 2021-02-13 20:13:31.324 [ Error ]: ICEORYX error! POSH__MEMPOOL_POSSIBLE_DOUBLE_FREE
iceperf-laurel: /home/pi/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:56: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted
debug PI tried again:
pi@raspberrypi:~/taps $ ./iceoryx_examples/iceperf/iceperf-hardy
****** MESSAGE QUEUE ********
registering with the leader, if no leader this will crash with a message queue error now
****** UNIX DOMAIN SOCKET ********
registering with the leader, if no leader this will crash with a socket error now
2021-02-13 19:33:01.382 [ Debug ]: Application registered management segment 0x72dc6000 with size 64440704 to id 1
2021-02-13 19:33:01.387 [ Info ]: Application registered payload segment 0x69f8c000 with size 149134400 to id 2
****** ICEORYX ********
Waiting for: subscription, subscriber [ success ]
2021-02-13 19:33:01.467 [ Error ]: ICEORYX error! POSH__MEMPOOL_POSSIBLE_DOUBLE_FREE
iceperf-hardy: /home/pi/iceoryx/iceoryx_utils/source/error_handling/error_handling.cpp:56: static void iox::ErrorHandler::ReactOnErrorLevel(iox::ErrorLevel, const char*): Assertion `false' failed.
Aborted
PI info:
pi@raspberrypi:~/Downloads $ cat /etc/debian_version
10.8
pi@raspberrypi:~/Downloads $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi@raspberrypi:~/Downloads $ uname -a
Linux raspberrypi 5.10.11-v7+ #1399 SMP Thu Jan 28 12:06:05 GMT 2021 armv7l GNU/Linux
pi@raspberrypi:~/Downloads $ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 5 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 5
processor : 1
model name : ARMv7 Processor rev 5 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 5
processor : 2
model name : ARMv7 Processor rev 5 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 5
processor : 3
model name : ARMv7 Processor rev 5 (v7l)
BogoMIPS : 38.40
Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xc07
CPU revision : 5
Hardware : BCM2835
Revision : a01041
Serial : 00000000c13b61ba
Model : Raspberry Pi 2 Model B Rev 1.1
pi@raspberrypi:~/Downloads $ gcc --version
gcc (Raspbian 8.3.0-6+rpi1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
master debug cubox:
pi@cubox-i:~/taps$ ./iceoryx_examples/iceperf/iceperf-laurel
****** MESSAGE QUEUE ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
#### Measurement Result ####
10000 round trips for each payload.
| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
| 1 | 60 |
| 2 | 69 |
| 4 | 80 |
| 8 | 1e+02 |
| 16 | 1.1e+02 |
| 32 | 1.9e+02 |
| 64 | 2.9e+02 |
| 128 | 5.4e+02 |
| 256 | 1e+03 |
| 512 | 2.1e+03 |
| 1024 | 4.1e+03 |
| 2048 | 8e+03 |
| 4096 | 1.6e+04 |
Finished!
****** UNIX DOMAIN SOCKET ********
waiting for follower
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
#### Measurement Result ####
10000 round trips for each payload.
| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
| 1 | 95 |
| 2 | 1.1e+02 |
| 4 | 1.1e+02 |
| 8 | 1.8e+02 |
| 16 | 2.9e+02 |
| 32 | 4.1e+02 |
| 64 | 7.6e+02 |
| 128 | 1.5e+03 |
| 256 | 2.9e+03 |
| 512 | 5.8e+03 |
| 1024 | 1.1e+04 |
| 2048 | 2.3e+04 |
| 4096 | 4.5e+04 |
Finished!
2021-02-14 01:15:01.089 [ Debug ]: Application registered management segment 0xffffffffb2e99000 with size 64440704 to id 1
2021-02-14 01:15:01.093 [ Info ]: Application registered payload segment 0xffffffffaa05f000 with size 149134400 to id 2
****** ICEORYX ********
Waiting for: subscription, subscriber [ success ]
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
Waiting for: unsubscribe [ finished ]
#### Measurement Result ####
10000 round trips for each payload.
| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
| 1 | 45 |
| 2 | 43 |
| 4 | 43 |
| 8 | 43 |
| 16 | 44 |
| 32 | 43 |
| 64 | 43 |
| 128 | 43 |
| 256 | 43 |
| 512 | 43 |
| 1024 | 43 |
| 2048 | 43 |
| 4096 | 43 |
Finished!
****** ICEORYX C API ********
Waiting for: subscription, subscriber [ success ]
Measurement for: 1 kB, 2 kB, 4 kB, 8 kB, 16 kB, 32 kB, 64 kB, 128 kB, 256 kB, 512 kB, 1024 kB, 2048 kB, 4096 kB,
Waiting for: unsubscribe [ finished ]
#### Measurement Result ####
10000 round trips for each payload.
| Payload Size [kB] | Average Latency [µs] |
|------------------:|---------------------:|
| 1 | 37 |
| 2 | 37 |
| 4 | 37 |
| 8 | 37 |
| 16 | 37 |
| 32 | 37 |
| 64 | 37 |
| 128 | 37 |
| 256 | 37 |
| 512 | 37 |
| 1024 | 37 |
| 2048 | 37 |
| 4096 | 37 |
Finished!
cubox info:
pi@cubox-i:~$ cat /etc/debian_version
bullseye/sid
pi@cubox-i:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Armbian 21.02.1 Focal"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
pi@cubox-i:~$ uname -a
Linux cubox-i 5.10.12-imx6 #21.02.1 SMP Wed Feb 3 21:02:35 CET 2021 armv7l armv7l armv7l GNU/Linux
pi@cubox-i:~$ cat /proc/cpuinfo
processor : 0
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 7.54
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
processor : 1
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 7.54
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
processor : 2
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 7.54
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
processor : 3
model name : ARMv7 Processor rev 10 (v7l)
BogoMIPS : 7.54
Features : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x2
CPU part : 0xc09
CPU revision : 10
Hardware : Freescale i.MX6 Quad/DualLite (Device Tree)
Revision : 0000
Serial : 0000000000000000
pi@cubox-i:~$ arch
armv7l
pi@cubox-i:~$ file /sbin/init
/sbin/init: symbolic link to /lib/systemd/systemd
pi@cubox-i:~$ file /lib/systemd/systemd
/lib/systemd/systemd: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=fb6445f8823882b0fa14a41dcad258ebf7b7555f, for GNU/Linux 3.2.0, stripped
pi@cubox-i:~$ lscpu
Architecture: armv7l
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Vendor ID: ARM
Model: 10
Model name: Cortex-A9
Stepping: r2p10
CPU max MHz: 996.0000
CPU min MHz: 396.0000
BogoMIPS: 7.54
Flags: half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
pi@cubox-i:~$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
@pabloEnzes2 @Indra5196
It seems that the iceperf benchmark is not running on Raspberry Pi OS January 11th 2021 (32-bit). I verified this with an Raspberry Pi 3b+. To track this issue I created #562 but at the moment we do not support 32-bit systems and it looks like we will not support them in the near future.
But nevertheless the iceoryx examples are running and the code compiles with a lot of warnings.
But if one of you would like to face the challenge to get this completely running again we would support you via https://gitter.im/eclipse/iceoryx in the endeavor.
@elfenpiff What happened to:
First of all please be aware that we support 32 bit system but do not optimize for them!
@elfenpiff What happened to:
First of all please be aware that we support 32 bit system but do not optimize for them!
@pabloEnzes2 I was mistaken and a colleague corrected me. I am sorry for the confusion! One year ago I implemented the 32-bit support and at the moment it seems like its working. If you encounter any problems please create an issue and we try to support you but we will not actively work on it those issues.
@Indra5196 I've documented the 64-bit requirement and added a warning on 32-bit systems. Have you tried to run iceperf inside a 64-bit Linux image on QEMU? Feel free to re-open this issue.
Required information
Operating system: Linux version 4.18.33-yocto-standard
Compiler version: GCC 8.2.0
iceoryx version 0.17.0.2
Yocto image specs: BB_VERSION = "1.40.0" BUILD_SYS = "x86_64-linux" NATIVELSBSTRING = "universal" TARGET_SYS = "i586-poky-linux" MACHINE = "qemux86" DISTRO = "poky" DISTRO_VERSION = "2.6.4" TUNE_FEATURES = "m32 i586" TARGET_FPU = ""
Expected result or behaviour: Performance of Iceoryx on my Ubuntu 18.04 LTS machine (Intel core i3 7th gen) is approximately 5 microseconds and I expected it to be atleast faster than MQ/UDS on Yocto
Does anyone know the cause and possible resolution for the same