calimero-project / calimero-core

Core library for KNX network access and management
Other
128 stars 65 forks source link

Tunneling connection not possible if working with multiple networkinterfaces #37

Closed tuxedo0801 closed 6 years ago

tuxedo0801 commented 7 years ago

I possibly found another issue, that is possibly caused by the same trap as issue #36 :

One of our users is has configured our application to use an KNX IP Interface, so KNX IP Tunneling...

The application tries to connect, but fails with:

Caused by: tuwien.auto.calimero.exception.KNXTimeoutException: timeout connecting to control endpoint KNX-IPIF-800653.fritz.box/192.168.188.56:3671 at tuwien.auto.calimero.knxnetip.ClientConnection.connect(ClientConnection.java:188) at tuwien.auto.calimero.knxnetip.KNXnetIPTunnel.(KNXnetIPTunnel.java:131) at tuwien.auto.calimero.link.KNXNetworkLinkIP.(KNXNetworkLinkIP.java:142) at tuwien.auto.calimero.link.KNXNetworkLinkIP.(KNXNetworkLinkIP.java:180) at de.root1.slicknx.Knx.(Knx.java:301) ... 5 more

I had a look at the code and found this:

The first UDP packet to connect to the KNX IP Tunneling Interface, is sent here:

https://github.com/calimero-project/calimero-core/blob/master/src/tuwien/auto/calimero/knxnetip/ClientConnection.java#L158

But the "answer-receiver" is started a few lines later:

https://github.com/calimero-project/calimero-core/blob/master/src/tuwien/auto/calimero/knxnetip/ClientConnection.java#L177

So, I guess there's also a gap where an answer might get lost, due to the not-yet started receiver.

I asked the user to capture with wireshark to see if there is an response to the connect datagram.

The weird thing is: If he's changing the network interface metric from "auto" to "1", all is working well...?!

After changing the metrik to 1, the routing table looks like:

IPv4-Routentabelle
===========================================================================
Aktive Routen:
     Netzwerkziel    Netzwerkmaske          Gateway    Schnittstelle Metrik
          0.0.0.0          0.0.0.0    192.168.188.1   192.168.188.28      1     <--------- CHANGED METRIC TO 1
        127.0.0.0        255.0.0.0   Auf Verbindung         127.0.0.1    331
        127.0.0.1  255.255.255.255   Auf Verbindung         127.0.0.1    331
  127.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
      169.254.0.0      255.255.0.0   Auf Verbindung   169.254.152.163    281
  169.254.152.163  255.255.255.255   Auf Verbindung   169.254.152.163    281
  169.254.255.255  255.255.255.255   Auf Verbindung   169.254.152.163    281
      192.168.2.0    255.255.255.0   Auf Verbindung       192.168.2.1    291
      192.168.2.1  255.255.255.255   Auf Verbindung       192.168.2.1    291
    192.168.2.255  255.255.255.255   Auf Verbindung       192.168.2.1    291
     192.168.56.0    255.255.255.0   Auf Verbindung      192.168.56.1    281
     192.168.56.1  255.255.255.255   Auf Verbindung      192.168.56.1    281
   192.168.56.255  255.255.255.255   Auf Verbindung      192.168.56.1    281
     192.168.78.0    255.255.255.0   Auf Verbindung      192.168.78.1    291
     192.168.78.1  255.255.255.255   Auf Verbindung      192.168.78.1    291
   192.168.78.255  255.255.255.255   Auf Verbindung      192.168.78.1    291
    192.168.188.0    255.255.255.0   Auf Verbindung    192.168.188.28    257    <----- DEFAULT ROUTE TO LAN
   192.168.188.28  255.255.255.255   Auf Verbindung    192.168.188.28    257
  192.168.188.255  255.255.255.255   Auf Verbindung    192.168.188.28    257
        224.0.0.0        240.0.0.0   Auf Verbindung         127.0.0.1    331
        224.0.0.0        240.0.0.0   Auf Verbindung    192.168.188.28    257
        224.0.0.0        240.0.0.0   Auf Verbindung      192.168.56.1    281
        224.0.0.0        240.0.0.0   Auf Verbindung      192.168.78.1    291
        224.0.0.0        240.0.0.0   Auf Verbindung       192.168.2.1    291
        224.0.0.0        240.0.0.0   Auf Verbindung   169.254.152.163    281
  255.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
  255.255.255.255  255.255.255.255   Auf Verbindung    192.168.188.28    257
  255.255.255.255  255.255.255.255   Auf Verbindung      192.168.56.1    281
  255.255.255.255  255.255.255.255   Auf Verbindung      192.168.78.1    291
  255.255.255.255  255.255.255.255   Auf Verbindung       192.168.2.1    291
  255.255.255.255  255.255.255.255   Auf Verbindung   169.254.152.163    281
===========================================================================

And this works quite well. Calimero is able to connect and KNX telegrams can be sent and received.

With the original "auto" metric, it's not working and giving the timeout error:

IPv4-Routentabelle
===========================================================================
Aktive Routen:
     Netzwerkziel    Netzwerkmaske          Gateway    Schnittstelle Metrik
          0.0.0.0          0.0.0.0    192.168.188.1   192.168.188.28     25        <--------- USING WINDOWS AUTO METRIC
        127.0.0.0        255.0.0.0   Auf Verbindung         127.0.0.1    331
        127.0.0.1  255.255.255.255   Auf Verbindung         127.0.0.1    331
  127.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
      169.254.0.0      255.255.0.0   Auf Verbindung   169.254.152.163    281
  169.254.152.163  255.255.255.255   Auf Verbindung   169.254.152.163    281
  169.254.255.255  255.255.255.255   Auf Verbindung   169.254.152.163    281
      192.168.2.0    255.255.255.0   Auf Verbindung       192.168.2.1    291
      192.168.2.1  255.255.255.255   Auf Verbindung       192.168.2.1    291
    192.168.2.255  255.255.255.255   Auf Verbindung       192.168.2.1    291
     192.168.56.0    255.255.255.0   Auf Verbindung      192.168.56.1    281
     192.168.56.1  255.255.255.255   Auf Verbindung      192.168.56.1    281
   192.168.56.255  255.255.255.255   Auf Verbindung      192.168.56.1    281
     192.168.78.0    255.255.255.0   Auf Verbindung      192.168.78.1    291
     192.168.78.1  255.255.255.255   Auf Verbindung      192.168.78.1    291
   192.168.78.255  255.255.255.255   Auf Verbindung      192.168.78.1    291
    192.168.188.0    255.255.255.0   Auf Verbindung    192.168.188.28    281     <----- DEFAULT ROUTE TO LAN
   192.168.188.28  255.255.255.255   Auf Verbindung    192.168.188.28    281
  192.168.188.255  255.255.255.255   Auf Verbindung    192.168.188.28    281
        224.0.0.0        240.0.0.0   Auf Verbindung         127.0.0.1    331
        224.0.0.0        240.0.0.0   Auf Verbindung    192.168.188.28    281
        224.0.0.0        240.0.0.0   Auf Verbindung      192.168.56.1    281
        224.0.0.0        240.0.0.0   Auf Verbindung      192.168.78.1    291
        224.0.0.0        240.0.0.0   Auf Verbindung       192.168.2.1    291
        224.0.0.0        240.0.0.0   Auf Verbindung   169.254.152.163    281
  255.255.255.255  255.255.255.255   Auf Verbindung         127.0.0.1    331
  255.255.255.255  255.255.255.255   Auf Verbindung    192.168.188.28    281
  255.255.255.255  255.255.255.255   Auf Verbindung      192.168.56.1    281
  255.255.255.255  255.255.255.255   Auf Verbindung      192.168.78.1    291
  255.255.255.255  255.255.255.255   Auf Verbindung       192.168.2.1    291
  255.255.255.255  255.255.255.255   Auf Verbindung   169.254.152.163    281
===========================================================================

I'm confused ...

1) Why is the receiver started AFTER the datagram has been sent?! --> GAP? 2) Why does changing the metric to 1 help? java detects the correct interface by the target IP ...

Of course, disabling all the other network interfaces (vpn, vmware, ...) also helps, but changing the metric or disabling network interfaces is not that user friendly...

Would be great if someone could give me a hint.

bmalinowsky commented 7 years ago

I asked the user to capture with wireshark to see if there is an response to the connect datagram.

I would be interested in the wshark trace for that, i.e., the case of the timeout error.

tuxedo0801 commented 7 years ago

I'll try to provide this. But still takes some time. There are other important things.

tuxedo0801 commented 7 years ago

I reproduced the issue. Setup:

nw-if

The first one is my local network interface, which is connected to my local LAN. The 2nd and 3rd one is from OpenVPN and Virtualbox. They are not actively used. They are just "there".

Test No.1 - all network interfaces "enabled" (like in previos screenshot) Starting my appliaction with KNX Tunneling mode. MDT IP Router used as network-interface with IP 192.168.200.71

My application tries to start the connection, but after a small timeout, it bringt ups this exception (like in 1st post of this issue:

exception

Wireshark was running in background, started capturing BEFORE the application has been started, and stopped after the exception. I filtered on the KNX Interface IP address:

multiple-nw-if

--> I don't see any UDP connection... Just multicast traffic (coming from other KNX devices/software running in the network)

Test No. 2 - disabled all but the local LAN interface (only "Ethernet" is enabled, the others are disabled)

There's no timeout. Connection just works and I can listen to telegrams on KNX...

Wireshark started/stopped the same way as in Test No.1:

only-one-nw-if

Now I can see UDP traffic.

I tried to go back to Test No. 1 and capture traffic on the other two interfaces. Maybe the software is sending on the wrong network device... But ... nothing. There are no packets at all. So I'm a bit lost... Where is the UDP packet on Test No.1 ?!

calimero-project commented 7 years ago

Thank you for the traces.

As a preliminary observation: a tunneling connection cannot work if the request does not even get through the intended outgoing interface.

tuxedo0801 commented 7 years ago

Bullet point 1+2: I will provide this information later this day/later this week.

Bullet point 3: Indeed, it's physically and also virtually not connected at all (OpenVPN was not connected). So you're right, it's probably not considered at all.

As a preliminary observation: ...

You're absolutely right. That's why I wrote:

So I'm a bit lost... Where is the UDP packet on Test No.1 ?!

The data seems to be sent, but on which "magical/mysterious" interface?!

calimero-project commented 6 years ago

As there were no further updates or related issues in quite a while, I will close this as "works as designed".

tuxedo0801 commented 6 years ago

@calimero-project Sorry, I forgot about this issue, as I was not directly affected by this (but other users). Today the issue raised again... So please reopen.

The situation is as follows:

de.root1.slicknx.KnxException: Error connecting to KNX: on connect to /192.168.200.7:3671
    at de.root1.slicknx.Knx.<init>(Knx.java:304)
    at de.konnekting.suite.Main.connectKnx(Main.java:258)
    at de.konnekting.suite.Main.access$100(Main.java:86)
    at de.konnekting.suite.Main$2.run(Main.java:180)
    at java.lang.Thread.run(Thread.java:748)
    at de.konnekting.suite.BackgroundTask$1.run(BackgroundTask.java:59)
Caused by: tuwien.auto.calimero.exception.KNXException: on connect to /192.168.200.7:3671
    at tuwien.auto.calimero.knxnetip.ClientConnection.connect(ClientConnection.java:173)
    at tuwien.auto.calimero.knxnetip.KNXnetIPTunnel.<init>(KNXnetIPTunnel.java:131)
    at tuwien.auto.calimero.link.KNXNetworkLinkIP.<init>(KNXNetworkLinkIP.java:142)
    at tuwien.auto.calimero.link.KNXNetworkLinkIP.<init>(KNXNetworkLinkIP.java:180)
    at de.root1.slicknx.Knx.<init>(Knx.java:298)
    ... 5 more
Caused by: java.net.SocketException: Network is unreachable: Datagram send failed
    at java.net.DualStackPlainDatagramSocketImpl.socketSend(Native Method)
    at java.net.DualStackPlainDatagramSocketImpl.send(DualStackPlainDatagramSocketImpl.java:136)
    at java.net.DatagramSocket.send(DatagramSocket.java:693)
    at tuwien.auto.calimero.knxnetip.ClientConnection.connect(ClientConnection.java:158)
    ... 9 more

The shown IP is the correct IP of the KNX IP INTERFACE. If I disable all network interfaces but the wireless, the connection works.

I debugged the issue and found this out:

image

The UDP is using a local-endpoint instead of letting the OS choose the correct local endpoint, based on the requested target IP. InetAddress.getLocalHost() is returning the correct hostname "p1593", but not an IP of the wireless-card (192.168.200.x) but from "Hyper-V Virtual Ethernet Adapter #4", which is obviously totally wrong.

The reason for this is obviously this:

ping -4 p1593

Ping wird ausgeführt für p1593.ibsolution.local [172.26.67.161] mit 32 Bytes Daten:
Antwort von 172.26.67.161: Bytes=32 Zeit<1ms TTL=128
Antwort von 172.26.67.161: Bytes=32 Zeit=1ms TTL=128
Antwort von 172.26.67.161: Bytes=32 Zeit<1ms TTL=128
Antwort von 172.26.67.161: Bytes=32 Zeit<1ms TTL=128

So it's resolving to wrong IP. And that's why java is getting wrong IP for localhost.

BUT: Isn't there an option to NOT use an local endpoint? What about using the "useNAT" flag in code? According to screenshot, useNAT is not used at all for this code-part. Can't this be used to control whether its using local EP or not (=letting OS decide about local EP).

br, Alex

p.s. using core 2.3.0, have't checked if newer version has same issue.

tuxedo0801 commented 6 years ago

Hmm, there's an updated version:

https://github.com/calimero-project/calimero-core/blob/master/src/tuwien/auto/calimero/knxnetip/ClientConnection.java#L158

Trying that ...

tuxedo0801 commented 6 years ago

Works. Issue can stay closed.