epics-base / jca

Java Channel Access client API
https://www.javadoc.io/doc/org.epics/jca/latest/index.html
Other
8 stars 14 forks source link

Zero length UDP packets #45

Closed mdavidsaver closed 4 years ago

mdavidsaver commented 5 years ago

I notice that JCA is sending zero length UDP packets with destination 5064. These seem to be sent just after a search is (or would be) sent.

image

Oddly, I also observe these being sent in a situation where (I think) a valid search should be sent, but isn't.

image

The difference between these two cases is that in the first I explicitly specify addr_list to include one of my local interface broadcast addresses. In the second case I leave addr_list empty.

com.cosylab.epics.caj.CAJContext.addr_list = 192.168.210.255
com.cosylab.epics.caj.CAJContext.auto_addr_list = true
com.cosylab.epics.caj.CAJContext.beacon_period = 15
com.cosylab.epics.caj.CAJContext.connection_timeout = 30
com.cosylab.epics.caj.CAJContext.max_array_bytes = 100000000
com.cosylab.epics.caj.CAJContext.repeater_port = 5065
com.cosylab.epics.caj.CAJContext.server_port = 5064
com.cosylab.epics.caj.CAJContext.use_pure_java = true

vs.

com.cosylab.epics.caj.CAJContext.auto_addr_list = true
com.cosylab.epics.caj.CAJContext.beacon_period = 15
com.cosylab.epics.caj.CAJContext.connection_timeout = 30
com.cosylab.epics.caj.CAJContext.max_array_bytes = 100000000
com.cosylab.epics.caj.CAJContext.repeater_port = 5065
com.cosylab.epics.caj.CAJContext.server_port = 5064
com.cosylab.epics.caj.CAJContext.use_pure_java = true

In the second case it seems that no searches are sent at all, so PVs are never connected (which is why I was looking).

My 192.168.210.1/24 interface is both UP and RUNNING, so I think that it should be included in the auto address list. It is not the default route (packets set to 255.255.255.255 go elsewhere). Given that the zero length packets are being sent, it is being found somehow.

I wonder if these aren't two separate problems with the searching process.

This is with a current build of phoebus master branch (and whatever jca build it pulls in).

kasemir commented 5 years ago

There was a problem with 'empty PV name' messages reported by the CA server for CAJ clients. It should not be possible to create channels with an empty name, https://github.com/epics-base/jca/blob/master/src/core/com/cosylab/epics/caj/CAJContext.java#L986, but it still happened.

Don't know if that was actually the same problem (empty UDP packet) or an otherwise valid search request with empty name.

mdavidsaver commented 5 years ago

At a glance, I notice that neither flushSendBuffer() or its callers test to see if sendBuffer is empty.

https://github.com/epics-base/jca/blob/96cb58ed4687908e73c6a80f89590f45efb9b318/src/core/com/cosylab/epics/caj/impl/ChannelSearchManager.java#L233

kasemir commented 5 years ago

@shroffk I think Michael has a good suggestion. We could add some check like

if (sendBuffer.limit() > 0)
     context.getBroadcastTransport().send(sendBuffer); 
else
    new Exception("Trying to send empty UDP packet").printStackTrace();

This assumes that the ByteBuffer has been 'flipped' so it's ready to send. Otherwise need to check position() instead of limit(). I think it's important to see a stack trace to have a chance of determining how we end up in this situation. But that methods throws no exception, and I also don't see a logger, so to minimize the changes we could just print the stack trace right there. Better would of course be throwing that exception, but that changes the method signature. Or use java.util.logging.

shroffk commented 5 years ago

Ok...we can add the print the exception at the very least to help understand Michaels error. @mdavidsaver if we put this on a branch will that work for you?

mdavidsaver commented 5 years ago

I've tried it already, and it doesn't trigger. So my guess is wrong. I tried with position()==0 as well.

I should also mention that I do see, and have seen so long that I've stopped seeing it, an error due to my firewall configuration now allowing CA UDP messages out on my wireless interface.

2019-08-21 18:43:45 WARNING [com.cosylab.epics.caj.impl.BroadcastTransport] Failed to sent a datagram to:/10.255.11.255:5064
java.net.SocketException: Operation not permitted
        at java.base/sun.nio.ch.DatagramChannelImpl.send0(Native Method)
        at java.base/sun.nio.ch.DatagramChannelImpl.sendFromNativeBuffer(DatagramChannelImpl.java:584)
        at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:546)
        at java.base/sun.nio.ch.DatagramChannelImpl.send(DatagramChannelImpl.java:529)
        at com.cosylab.epics.caj.impl.BroadcastTransport.send(BroadcastTransport.java:238)
        at com.cosylab.epics.caj.impl.ChannelSearchManager.flushSendBuffer(ChannelSearchManager.java:238)
        at com.cosylab.epics.caj.impl.ChannelSearchManager.access$200(ChannelSearchManager.java:20)
        at com.cosylab.epics.caj.impl.ChannelSearchManager$ChannelSearchTimerTask.timeout(ChannelSearchManager.java:66)
        at com.cosylab.epics.caj.util.SearchTimer$RunLoop.run(SearchTimer.java:255)
        at java.base/java.lang.Thread.run(Thread.java:834)

Seeing this I tried:

                try {
                        context.getBroadcastTransport().send(sendBuffer);
                } finally {
                        initializeSendBuffer();
                }

Which doesn't seem to make a difference.

mdavidsaver commented 5 years ago

Ok. going further up the stack I think I see it. The failed send() doesn't advance the position().

diff --git a/src/core/com/cosylab/epics/caj/impl/BroadcastTransport.java b/src/core/com/cosylab/epics/caj/impl/BroadcastTransport.java
index 2a1ebb9..88841a7 100644
--- a/src/core/com/cosylab/epics/caj/impl/BroadcastTransport.java
+++ b/src/core/com/cosylab/epics/caj/impl/BroadcastTransport.java
@@ -233,12 +233,13 @@ public class BroadcastTransport implements Transport, ReactorHandler {
                                // prepare buffer
                                buffer.flip();

-                               //context.getLogger().finest("Sending " + buffer.limit() + " bytes to " + broadcastAddresses[i] + ".");
+                               context.getLogger().warning("Sending " + i + " " + buffer.limit() + " bytes to " + broadcastAddresses[i] + ".");

                                channel.send(buffer, broadcastAddresses[i]);
                        }
                        catch (Throwable ioex) 
                        {
+                               buffer.position(buffer.limit());
                                // TODO what to do here
                                logger.log(Level.WARNING, "Failed to sent a datagram to:" + broadcastAddresses[i], ioex);
                        }
mdavidsaver commented 5 years ago

The difference between these two cases is that in the first I explicitly specify addr_list to include one of my local interface broadcast addresses. In the second case I leave addr_list empty.

This being due to ordering in the broadcastAddresses list. The manual addr_list entries appear before the automatic ones.

shroffk commented 5 years ago

To me it makes sense that the explicitly specified addr_list should be used first...

What is the expected behaviour here...if someone has set the manual_addr_list, should both the manual and the automatic list be used? In the above case, if jca detected that the manual_addr_list was not set or empty and simply ignored it and moved on to the automatic addr list would that be the correct behaviour?

kasemir commented 5 years ago

For the new PVA implementation, I thought it would make sense to not only use the manually added items first, but exclusively, i.e. either use the local interfaces when the addr list is empty, or use the addr list exactly as is when it's defined.

But the EPICS community has 30 years of history with EPICS_CA_AUTO_ADDR_LIST=Yes as a default, so the local interfaces are always expected. Still think they should be used last, after manually entered items. Since we're talking about UDP, though, I'm not sure you can have any expectations as to who answers "first", since you can't know which server sees a search request in what order.

mdavidsaver commented 5 years ago

specified addr_list should be used first

I'm sorry if I wasn't clear. Ordering should not make any difference. That it does is the bug.