epics-base / pvAccessCPP

pvAccessCPP is an EPICS V4 C++ module
https://epics-base.github.io/pvAccessCPP/
Other
9 stars 22 forks source link

Network message lack of validation #104

Open mdavidsaver opened 6 years ago

mdavidsaver commented 6 years ago

There are a number of ways in which a improperly constructed PVA message will lead to a crash, or other undesirable behavior. This issue will track the discovery, and eventually fixing, of these issues.

mdavidsaver commented 6 years ago

The same message processing logic is applied to both UDP and TCP messages. Thus a UDP broadcast containing a CMD_GET runs afoul of:

https://github.com/epics-base/pvAccessCPP/blob/4b3237a20db5f6a68cb4910a8b4f0a66599f4e82/src/server/responseHandlers.cpp#L1009-L1010

mdavidsaver commented 6 years ago

https://github.com/epics-base/pvAccessCPP/blob/4b3237a20db5f6a68cb4910a8b4f0a66599f4e82/src/server/responseHandlers.cpp#L2608

Mixing up IOIDs. Eg, using the IOID of a get in a CMD_PUT.

mdavidsaver commented 5 years ago

A gremlin encountered during P4P testing. Accompanied by a RPC client timeout. Message type 4 is CMD_SEARCH_RESPONSE. Looks like a server somehow seeing a search response...

Wed Sep 26 17:00:54 PDT 2018
2018-09-26T17:00:55.962 Using dynamically assigned TCP port 38829.
.2018-09-26T17:00:56.005 Using dynamically assigned TCP port 35459.
.2018-09-26T17:00:56.044 Using dynamically assigned TCP port 34771.
.2018-09-26T17:00:56.085 Using dynamically assigned TCP port 54209.
.2018-09-26T17:00:56.374 Using dynamically assigned TCP port 59025.
.2018-09-26T17:00:56.656 Using dynamically assigned TCP port 44781.
.2018-09-26T17:00:56.693 Using dynamically assigned TCP port 40051.
2018-09-26T17:00:56.695 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:56.695 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:56.922 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:56.922 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:57.376 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:57.376 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:58.284 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:00:58.284 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:01:00.100 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
2018-09-26T17:01:00.100 Undecipherable message (bad response type 4) from 127.0.0.1:50552.
EE
======================================================================
ERROR: test_rpc_error (p4p.test.test_cothread.TestRPC)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mdavidsaver/work/epics/p4p/python2.7_d/linux-x86_64/p4p/test/test_cothread.py", line 145, in test_rpc_error
    ret = C.rpc('foo', args.wrap('foo', kws={'oops':True}))
  File "/home/mdavidsaver/work/epics/p4p/python2.7_d/linux-x86_64/p4p/client/cothread.py", line 216, in rpc
    raise ret
TimeoutError: Timeout
-------------------- >> begin captured logging << --------------------
p4p.test.utils: DEBUG: REFS {'Field': 32L, 'PVField': 3L, 'p4p._p4p.Type': 2L}
p4p.server: DEBUG: Starting Server isolated=True, {'useenv': False, 'conf': {'EPICS_PVA_BROADCAST_PORT': '0', 'EPICS_PVA_SERVER_PORT': '0', 'EPICS_PVAS_INTF_ADDR_LIST': '127.0.0.1', 'EPICS_PVA_ADDR_LIST': '127.0.0.1', 'EPICS_PVA_AUTO_ADDR_LIST': '0'}, 'providers': [<p4p.server.StaticProvider object at 0x7f6733fd1060>]}
p4p.client.cothread: DEBUG: rpc foo epics:nt/NTURI:1.0 
    string scheme 
    string authority 
    string path foo
    structure query
        boolean oops true
 request=None
p4p.client.raw: DEBUG: Handler (1, , None) -> <functools.partial object at 0x7f6733fbdda8>
p4p.server: DEBUG: Stopping Server
--------------------- >> end captured logging << ---------------------
mdavidsaver commented 5 years ago

ChannelSearchManager::callback() may be called from several threads (timer and socket RX workers), and uses member variables w/o locking. I think this leaves open the probably of sending corrupt search request messages if eg. the timer expires at the same time a first Channel is created, or a beacon anomaly is triggered.

mdavidsaver commented 5 years ago

cf. #137 where a UDP message decode error was being lost.

mdavidsaver commented 5 years ago

A PVA client at SLAC has seen another instance of message "corruption".

2019-03-18T15:41:22.544 Protocol Violation: Not-a-first segmented message received in normal mode from the client at ../../src/remote/codec.cpp:210: XXX.XXX.XXX.XXX:5075, disconnecting...
mdavidsaver commented 5 years ago

Possible race in socket handling #149 might result in worker(s) for closing TCP socket accidentally operating on a socket for a new connection.