Closed vadim-at-te closed 3 months ago
As it turns out, it was a very silly (to put it lightly) fault internal to the company. We have a multitude of distinct uses of MDSplus, in addition to the classic use of storing experimental data. We also store simulated data, calibration data, pre-computed quantities (matrices) used in real-time for fast reconstruction, testing, experimental pulse preparation etc. All these activities take place in the designated "bandwidth" pulse range. The enforcement of the rules is not perfect and, recently, some data was written into experimental pulse range, confusing the in-house system that automatically locks (makes read-only) raw data trees 20 minutes after creation. Hence, we sometimes locked, unintentionally, some raw data trees faster than in 20 minutes and there was not enough time to finish writing raw data.
Really sorry to have taken your time on this. On the bright side, as part of your feedback and in-house troubleshooting we've gained the knowledge of how to set up alternative mdsip-based servers, e.g. with -s flag, via systemd instead of xinetd. We are now able to review for which systems a dedicated systemd-based mdsip server is appropriate and whith which flag.
I'm glad to hear that you have it sorted out! Or at least have a plan to sort it out. Definitely keep us posted on how the dedicated systemd services work out for you, and don't hesitate to reach out if you run into any more issues.
Affiliation Tokamak Energy,173 Brook Dr, Milton, Abingdon, UK
Version(s) Affected SERVER_MAIN: MDSplus version 7.131-6 SERVER_BIG_DATA: MDSplus version 7.49-3
mdsplus clients - different versions. some 7.49-3, some 7.131-6
Platform(s) SERVER_MAIN: RHEL 8 ( 4.18.0-513.24.1.el8_9.x86_64 ) SERVER_BIG_DATA: Centos 7 ( 3.10.0-1160.118.1.el7.x86_64 ),
Installation Method(s) "yum" package manager + manual bug fixes for the SERVER_MAIN:
the SERVER_BIG_DATA installation is similar
Describe the bug the acute problems is that diagnostic systems sporadically fail to write raw experimental data into MDSplus, risking the loss of valuable data. The error messages reported are either a) MDSplus.connection.MdsIpException: %MDSPLUS-E-Unknown, Error connecting to tcp://192.168.1.7:8000 OR b) %TREE-E-FAILURE, Operation NOT successful
from user experience, these two situations take place intermittently. Situation 1: ipython
Situation 2 - FAIL: conn = Connection('SERVER_MAIN') conn.treeOpen('TREE_THAT_LIVES_ON_SERVER_BIG_DATA', pulse_number) FAIL # try again later - and it works OK
Situation 2 - SUCCESS: conn = Connection('SERVER_BIG_DATA') conn.treeOpen('TREE_THAT_LIVES_ONSERVERBIG_DATA', pulse_number) SUCCESS
Situation 2 - SUCCESS-SOMETIMES:
adjust /etc/mdsplus.conf MDSIP_CONNECT_TIMEOUT 1 -> 10
conn = Connection('SERVER_BIG_DATA') conn.treeOpen('TREE_THAT_LIVES_ONSERVERBIG_DATA', pulse_number) FAILS SOMETIMES
adjust back how it was: /etc/mdsplus.conf MDSIP_CONNECT_TIMEOUT 10 -> 1
SUCCESS
To Reproduce We don't know how to reproduce it. Things tend to be ok when we are not operating the tokamak and activity is low. When we do operate, we dump large volume of data of many tokamak diagnostics systems to MDSplus, more-or-less at the same time. We've rewritten diagnostics systems based on matlab or python interface to connect directly to SERVER_BIG_DATA and that offer a work-around, however LabView legacy apps are too hard to change and those connect to SERVER_MAIN (to write data, ultimately, to SERVER_BIG_DATA) and those continue failing to write data sporadically. During operations the activity around reading MDSplus data also goes up b/c the team analysis experimental results to guide the subsequent plasma pulses.
Steps to reproduce the behavior:
Expected behavior I expect to either always succeed or always fail to connect to MDSplus server. I expect to either always succeed or always fail to write to MDSplus. Instead, it's intermittent. Having verified tree paths and all that, ofcourse, I actually expect both actions to always succeed.
Screenshots
Additional context I have asked about this on MDSplus "discord" forum: https://discord.com/channels/935565750679273482/935565751513935955/1248288784684945489