MDSplus / mdsplus

The MDSplus data management system
https://mdsplus.org/
Other
74 stars 44 forks source link

Exceeding the limit of 64 concurrent mdsip connections causes a crash #2638

Open mwinkel-dev opened 1 year ago

mwinkel-dev commented 1 year ago

Affiliation MIT PSFC

Version(s) Affected Definitely occurs in alpha, likely in stable too.

Platform Ubuntu 20, however surely applies to all platforms

Describe the bug Attempted to create 100 concurrent mdsip connections (aka sockets) and found that MDSplus only supports 64 concurrent connections. Although it is unlikely that any customers will need more than 64 concurrent connections, nonetheless MDSplus shouldn't crash.

To Reproduce Steps to reproduce the behavior:

This bug was found while testing the IDL API (however it likely exists in all APIs). Run the following program at the IDL prompt.

pro loop64
   for i = 1, 100 do begin
      mysocket=-1
      mdsconnect, 'some_server', socket=mysocket
      print, mysocket
   endfor
end

Note that it generates the following output.

IDL> loop64
% Compiled module: LOOP64.
% Compiled module: MDSCONNECT.
% Compiled module: MDS_KEYWORD_SET.
           0
           1
           2
           3
etc.
          60
          61
          62
          63
E, 1169515:1169515, 1697052955.039341178: /tmp/x_markw/mw_alpha_2625_v4/mdsplus/mdstcpip/mdsipshr/GetMdsMsg.c:62 get_bytes_to() Connection(id=-1, state=0x00, protocol='tcp', info_name='tcp', version=0, user='(null)') error 0/48: Connection reset by peer
Error during login: recv NULL
% MDSCONNECT: Error connecting to some_server
% Execution halted at: LOOP64              4 /home/mwinkel/x_markw/v_idl/loop64.pro
%                      $MAIN$          
IDL> 

Expected behavior Instead of crashing, MDSplus should instead display an error message that explains that the limit has been exceeded. And that error message should appear in all APIs (Python, Java, and so forth). Also would be good to determine what is causing the limit (conjecture is that it is in the C code in mdsipshr). And to decided whether the limit should be increased, reduced or left as is.

Screenshots n/a

Additional context n/a

mwinkel-dev commented 1 year ago

Have not yet investigated and identified the root cause of this issue.

However, did notice that the behavior changed when pointed the test to a different mdsip server. Must compare the configurations of the two servers to see why the above test fails on the first server but passes on the second server.

mwinkel-dev commented 1 year ago

There is already a test case for this bug in the IDL test harness.

mwinkel-dev commented 1 year ago

Checked the systemd config files on the mdsip server that fails the 64 concurrent connections test. No obvious limits spotted.

Also, PR #2644 made no difference in the limit.