COVESA / vsomeip

An implementation of Scalable service-Oriented MiddlewarE over IP
Mozilla Public License 2.0
1.12k stars 697 forks source link

[error] netlink_connector::receive_cbk received error message: 2 seq 3 #492

Closed dprogm closed 6 days ago

dprogm commented 1 year ago

Hello everybody :wink:

We are getting the following error on an ARM device which is running the service part. The following error is thrown on startup:

[info] Network interface "bridge0" state changed: up
[error] netlink_connector::receive_cbk received error message: 2 seq 3

Normally we are expecting logging output regarding the routing/multicast.

This issue is actually a duplicate of https://github.com/COVESA/vsomeip/issues/214, but it is slightly different and there is still no answer available.

We are thinking that this is the reason why the client is not able to connect to the service. Is there any requirement regarding IP addresses (range) or netmask? We are using the 10/8 address space.

Thanks in advance

hugojacob commented 1 year ago

I'm seeing the same when using kernel 6.x.

Is it a known bug?

dprogm commented 1 year ago

What version of boost do you use? Maybe a too up to date version might cause troubles.

I will give an update tomorrow regarding the linux kernel versions used in our setup. But we aren't using 6.x at least.

hugojacob commented 1 year ago

The issue seems to be related to vsomeip >= 3.3.0. Downgrading to vsomeip 3.1.37.1 works fine for me.

Here is a bit more of information about my environment

vsomeip >= 3.3.0 never joins the broadcast group and thus never gets the SD messages:

[INFO] AbstractTester::init(): initializing vsomeip app.
2023-08-30 17:07:26.463308 [info] Using configuration file: "etc/vsomeip_client.json".
2023-08-30 17:07:26.463309 [info] Parsed vsomeip configuration in 0ms
2023-08-30 17:07:26.463309 [info] Configuration module loaded.
2023-08-30 17:07:26.463309 [info] Initializing vsomeip (3.3.8) application "hello_tester".
2023-08-30 17:07:26.463310 [info] Instantiating routing manager [Host].
2023-08-30 17:07:26.463311 [info] create_routing_root: Routing root @ /tmp/vsomeip-0
2023-08-30 17:07:26.463311 [info] Service Discovery enabled. Trying to load module.
2023-08-30 17:07:26.463315 [info] Service Discovery module loaded.
2023-08-30 17:07:26.463315 [info] Application(hello_tester, 3500) is initialized (11, 100).
2023-08-30 17:07:26.463315 [info] Starting vsomeip application "hello_tester" (3500) using 2 threads I/O nice 255
2023-08-30 17:07:26.463316 [info] Client [3500] routes unicast:192.168.98.1, netmask:255.255.255.0
2023-08-30 17:07:26.463316 [info] main dispatch thread id from application: 3500 (hello_tester) is: 7f72e35116c0 TID: 3360556
2023-08-30 17:07:26.463317 [info] shutdown thread id from application: 3500 (hello_tester) is: 7f72e2d106c0 TID: 3360557
2023-08-30 17:07:26.463318 [info] Watchdog is disabled!
[INFO] AbstractTester::onVsomeipStateChange -> 0
[INFO] AbstractTester::onVsomeipStateChange: registering handlers
[INFO] Requesting service 0x3500.0x1000
2023-08-30 17:07:26.463320 [info] REQUEST(3500): [3500.1000:255.4294967295]
2023-08-30 17:07:26.463320 [info] io thread id from application: 3500 (hello_tester) is: 7f72e3d13740 TID: 3360554
2023-08-30 17:07:26.463320 [info] io thread id from application: 3500 (hello_tester) is: 7f72e1d0e6c0 TID: 3360559
2023-08-30 17:07:26.463321 [info] create_local_server: Listening @ /tmp/vsomeip-3500
[INFO] HelloTester::onVsomeipAvailabilityChange: service availability changed 0x3500.0x1000 -> NOT available
2023-08-30 17:07:26.463327 [info] Network interface "cvd-ebr" state changed: up
2023-08-30 17:07:26.463329 [error] netlink_connector::receive_cbk received error message: 2 seq 3
2023-08-30 17:07:26.464323 [warning] BLOCKING CALL AVAILABILITY(3500): [3500.1000]
2023-08-30 17:07:26.464325 [info] dispatch thread id from application: 3500 (hello_tester) is: 7f72e150d6c0 TID: 3360560
[INFO] Trying again...
2023-08-30 17:07:31.513323 [info] REQUEST(3500): [3500.1000:255.4294967295]

But also note that with vsomeip 3.1.37.1 I also get the error message mentioned in this thread:

2023-07-30 17:56:04.643669 [info] Network interface "cvd-ebr" state changed: up
2023-07-30 17:56:04.643672 [error] netlink_connector::receive_cbk received error message: Unknown error -16 type 26 seq 3
2023-07-30 17:56:04.643673 [info] Retrying netlink route request
2023-07-30 17:56:04.643674 [info] Route "224.0.0.0/4 if: cvd-ebr gw: n/a" state changed: up
2023-07-30 17:56:04.643674 [info] udp_server_endpoint_impl: SO_RCVBUF is: 1703936 (1703936) local port:30491
2023-07-30 17:56:04.643675 [debug] Joining to multicast group 224.0.0.0 from 192.168.98.1
dprogm commented 1 year ago

Okay, thanks for the hint. I will check the concrete vsomeip version.

hugojacob commented 1 year ago

Hey @dprogm,

just a friendly ping to check whether have you had the time to look into this.

dprogm commented 1 year ago

Hi @hugojacob Sorry for not getting back to you quickly. I had no access to my workstation the last few weeks. I will check the SomeIP version tomorrow and give an update here. But I am pretty sure that we are using the dev state (HEAD of master) of the SomeIP library in conjunction with CommonAPI (see https://github.com/COVESA/capicxx-someip-tools/wiki/CommonAPI-C---SomeIP-in-10-minutes). I will pin the version and check again.

dprogm commented 1 year ago

@hugojacob Small update. I am using Version 3.3.8 and there is no error. I will check the concrete Version from my co-worker too.

goncaloalmeida commented 1 year ago

@dprogm the problem is still occurring? from your last comment, it seems not.

cc @fcmonteiro

elektronenhirn commented 10 months ago

The problem is still there, I see it, too.

My setup:

I digged a bit deeper and found out some interesting things:

  1. as @hugojacob already noticed, the error messages differs between different vsomeip versions. that is because the formatting of the message got changed. but the error is actually the same
  2. we both receive an EBUSY return code from the kernel
  3. there was a patch with a retry mechanism submitted by @phiwer here: https://github.com/COVESA/vsomeip/commit/a268e2d68d30ea9b09834dbaca60530b4314774c
  4. but the patch got meanwhile removed here: https://github.com/COVESA/vsomeip/commit/826ebb8d352245a36ecaec32b6af61e7abf4696e
  5. I re-applied the patch locally and it fixed my issue: the 2nd re-try usually works for me

Log output with some custom trace messages:

//Netlink get route fails 1. time
2024-01-05 16:02:57.778813 [error] netlink_connector::receive_cbk received error message: Unknown error -16 type 26 seq 3

//1. Netlink retry
2024-01-05 16:02:57.778814 [warning] send_rt_request
2024-01-05 16:02:57.778814 [info] Retrying netlink route request
2024-01-05 16:02:57.778814 [warning] receive_cbk
2024-01-05 16:02:57.778814 [info] offer_event: Event [1236.567a.ea68] uses configured cycle time 0ms
2024-01-05 16:02:57.778814 [warning] receive_cbk
2024-01-05 16:02:57.778814 [info] REGISTER EVENT(1212): [1236.567a.ea68:is_provider=true]
2024-01-05 16:02:57.778814 [warning] receive_cbk
2024-01-05 16:02:57.778814 [warning] receive_cbk

//Netlink get route fails 2. time
2024-01-05 16:02:57.778814 [error] netlink_connector::receive_cbk received error message: Unknown error -16 type 26 seq 3

//2. Netlink retry
2024-01-05 16:02:57.778814 [warning] send_rt_request
2024-01-05 16:02:57.778814 [info] OFFER(1212): [1236.567a:1.0] (true)
2024-01-05 16:02:57.778815 [info] Retrying netlink route request
2024-01-05 16:02:57.778815 [warning] receive_cbk
2024-01-05 16:02:57.778815 [warning] receive_cbk

//now it seems to work :-)
2024-01-05 16:02:57.778817 [info] Route "224.224.224.245/32 if: cvd-ebr gw: n/a" state changed: up
2024-01-05 16:02:57.778818 [info] udp_server_endpoint_impl: SO_RCVBUF is: 212992 (1703936) local port:30490
2024-01-05 16:02:57.778818 [debug] Joining to multicast group 224.224.224.245 from 192.168.98.1
2024-01-05 16:02:57.778818 [info] SOME/IP routing ready.
2024-01-05 16:02:57.778818 [info] udp_server_endpoint_impl<multicast>: SO_RCVBUF is: 212992 (1703936) local port:30490
2024-01-05 16:02:57.778821 [warning] receive_cbk

Is it possible to re-open the issue?

Seems like a retry mechanism would actually make sense. At least on environments dealing with Android Trout / virtualized network environments. However I have no idea how this causes the EBUSY. But it seems like the similarity between @hugojacob ' setup and mine.

hugojacob commented 9 months ago

It seems to be the same as issue #607. There, the two hosts are a Linux-based machine and not AOSP 14 (as @elektronenhirn 's and my setup).

hugojacob commented 8 months ago

The issue I linked above (https://github.com/COVESA/vsomeip/issues/607) is fixed with PR https://github.com/COVESA/vsomeip/pull/591.

I'll give it a try and come back to you.

duartenfonseca commented 1 month ago

@elektronenhirn did you try with the fix from https://github.com/COVESA/vsomeip/pull/591 ?

duartenfonseca commented 6 days ago

no response, will close