MDSplus / mdsplus

The MDSplus data management system
https://mdsplus.org/
Other
71 stars 43 forks source link

MDSVALUE in MATLAB breaks when reading Signals #2660

Open ModestMC opened 9 months ago

ModestMC commented 9 months ago

Affiliation DIII-D / General Atomics

Version(s) Affected alpha-7.139.59, best we can confirm is that it worked in 6.1.84 (modifying MATLAB installs is a good amount of work)

Platform RHEL 8

Describe the bug While working on compiling our main tokamak simulation for our new cluster, we encountered an error with the MATLAB API. In short, it appears the current logic fails to correctly do the handling necessary to read signals.

To Reproduce Steps to reproduce the behavior:

  1. Open MATLAB (error first identified on MATLAB 2023b)
  2. mdsconnect to the server and mdsopen the tree/shot of interest
  3. mdsvalue(the problematic node)

Expected behavior The signal data stored in the node should be returned, instead the Java Connection class fails to properly handle the signal before passing it into the JavatoMATLAB case statement.

Screenshots The error in action

Screenshot 2023-12-07 at 2 14 49 AM

Old version working image

Problem Signal's inner contents image

Lazy incomplete workaround that appears to correctly resolve image

The apparent section of the code that needs to be amended/fixed/modified image

Additional context It may also be worth looking at the actual MTIME signal (can provide more information since you guys should have the necessary access) to confirm whether the syntax used in that node is not problematic, but I'm fairly certain it is not.

I also hoped to look at your existing MATLAB test suite, but your Jenkins instance appears to be down.

GabrieleManduchi commented 9 months ago

The problem is due to a recent improvement of the Connection object that now is also able to transfer data types other than scalars and array. However the signal that now shows up as result of Connection.get() was not properly handled by the matlab routine JavaToMatlab.m I will fix this in a couple of days.

ModestMC commented 9 months ago

I attempted a revert of the improvement described in PR #2620 and triggered a rebuild and the error still occurs. What from the changes you introduced at this time would have broken the feature?

I tried loading an older version that predates any of these feature sets (alpha_release-7-108-1) and my autobuilder broke, but in reading over the code I'm not seeing too much.

image image
GabrieleManduchi commented 9 months ago

Looking into the code, I noted that in the Java Connection there is a (wrong) possibility that in some condition the expression gets directly evaluated (without data()). This would explain what you see and is not related to the APD stuff. I will issue later a PR so that you cat try it

GabrieleManduchi commented 9 months ago

You may try branch gm-fix-apd that should contain, among others, a fix to the problem (basically due to the fact that the java Connection.get() method did not force a data() evaluation). The checks for the PR are in progress now, but meanwhile you can check that branch in order to verify the problem.

joshStillerman commented 9 months ago

The Jenkins server is undergoing maintenance. In the meanwhile, we will run the tests manually. Can you GA verify that Gabriele's branch fixes this problem?

ModestMC commented 9 months ago

I pulled the new branch and attempted to use it, but got the same error. What did you guys use to test the fix on your end?

GabrieleManduchi commented 9 months ago

Hmmm.. really strange, we tested it on our side on a whole set of MATLAB tests (not yet on jenkins). Could you send me the failing MATLAB code and possibly a description (e.g. via TCL show data) of the accessed node?

GabrieleManduchi commented 9 months ago

BTW the fix is not in the MATLAB MDSplus library, but in the Java mdsobjects library. Did you update it?

ModestMC commented 9 months ago

I believe these are the changes to the Java mdsobjects library you're describing.

Screenshot 2023-12-12 at 10 08 10 AM

Here is the data inside of the node itself that breaks when mdsvalue is called.

Screenshot 2023-12-12 at 10 15 16 AM

It also fails when the node is not self-referential.

Screenshot 2023-12-12 at 10 25 07 AM Screenshot 2023-12-12 at 10 26 05 AM
GabrieleManduchi commented 9 months ago

Are you sure that your MATLAB is using the right jar? Changing jars version on MATLAB is sometimes tricky.... We tested our version (branch gm-fix-apd) exactly in the same condition of your example and everything works as expected (while the current alpha HAS the problem)

GabrieleManduchi commented 9 months ago

Did you get a chance for checking this?

ModestMC commented 9 months ago

@GabrieleManduchi Not exactly. I've been having to debug our clusterwide MATLAB installation's way of loading MDSplus. For the past several years, we have made MDSplus available to MATLAB by having one symlink that's used for the java modifications MDSplus needs (two lines in classpath.txt and one line in the librarypath.txt). It turns out there's no good way to point the symlink at a user development version of MDSplus (also it means our MATLAB version is functionally pinned).

Anyways I've now spent probably half a day's worth of time collectively trying to figure out if I can somehow get lmod to be able to load in a particular version of MDSplus (we have a way of doing this for the development versions as well) and have MATLAB pull the necessary paths from the environment variables. I'm not sure there's a way to do this (see here), so for now I'm willing to accept defeat.

If you're sure about your fix, I think our only option is going to be to take it on faith that it works and release a new build for our cluster. I got official blessing to monkey with the clusterwide installation to test out the fix and it correctly retrieved signals. I couldn't do any serious/robust testing, but I think this will eliminate the main blocker.

mwinkel-dev commented 9 months ago

The gm-fix-apd branch passed manual testing on Ubuntu 20 for this fix. See posts in PR #2661.

mwinkel-dev commented 9 months ago

Gabriele's posts of 11-Dec-2023 (above) indicate that only some of the PR #2661 files are needed to fix this MATLAB bug. Thus, I created an experimental (private) branch with just the following files from the PR.

deploy/packaging/debian/kernel.noarch
deploy/packaging/redhat/kernel.noarch
include/mdsobjects.h
java/mdsobjects/src/main/java/MDSplus/Connection.java
javamds/mdsobjects.c
mdsobjects/cpp/mdsipobjects.cpp
tdi/treeshr/TreePutDeserialized.fun

When using the experimental branch, MATLAB was able to retrieve and plot a signal. (And for the sake of completeness, I also confirmed that the current alpha branch does fail with the errors shown in the initial bug report.)

Note though that the test-tab.ans file still needs to be updated with the new TreePutDeserialized.fun function. However, the experimental branch passed all other automated tests, and also passed the IDL tests.

If the decision is made to split PR #2661 into new PRs -- one for MATLAB, and the other for APD (aka "Array of Pointers to Descriptors") -- it would be best to have Gabriele review the proposed split.

joshStillerman commented 8 months ago

I thought they were complaining that MATLAB did. not deal with APDs.

GabrieleManduchi commented 8 months ago

Please keep the whole branch since it contains a set of(related) improvements for handling apds over connection, a feature requested by tcv and already extensively tested.

mwinkel-dev commented 8 months ago

Hi @joshStillerman and @GabrieleManduchi -- Thanks for both of your posts. I was merely doing research so that PSFC can discuss the options when the software team meets in early January.

Josh -- General Atomics was not reporting an issue with "array of pointers to descriptors" (APD). This bug #2660 is a more fundamental issue, namely that the MATLAB API cannot retrieve data from a signal. I reproduced that bug by attempting to retrieve a signal from a C-Mod shot. The retrieval failed with the same errors shown in the initial bug report. Note that Gabriele posted above that the fix for the MATLAB signal issue has been included in PR #2661 for the APD issue.

Gabriele -- Thanks for the advice. I do not intend to split PR #2661 into two PRs. Note though that GA has requested its own branch. I was merely researching options in case GA requests that their branch have a MATLAB-only fix (i.e., if GA wants to exclude the APD fix).

Note -- The long-term goal is to run MATLAB tests as part of the build system (however that requires adding some infrastructure). A good starting point will be to write MATLAB test cases that are comparable to those in the IDL Test Harness.

mwinkel-dev commented 8 months ago

Results of additional experiments . . .

alpha_release-7-139-49 (PR #2622, 22-Sep-2023, 21ead653c) = MATLAB is able to retrieve and plot a signal

alpha_release-7-139-50 (PR #2620, 23-Sep-2023, d996b0ce5) = MATLAB throws errors when attempt to retrieve a signal

ModestMC commented 8 months ago

@mwinkel-dev can you tell me what all these APD fixes will impact in the codebase (eg. what all we'd want to test internally before using them)? I assume you've already looked over the same files I was about to go read for myself, figure this is simpler.

mwinkel-dev commented 8 months ago

Hi @ModestMC -- I have not yet reviewed the "Array of Pointers to Descriptors" (APD) changes in detail. They originated with PR #2620 (and also require pending PR #2661). My assumption (perhaps incorrect) is that the APD feature applies to all APIs.

If GA uses APDs in its queries, then it would be best to take the entire PR #2661.

However, if GA never uses APDs, then it might be possible to use "alpha 7-139-49" as the base for the GA branch. Note though that "7-139-49" predates the IDL fixes made in October, thus there would be several cherry picks that GA would probably want.

I am presently working on porting the IDL tests to MATLAB. Initially, the MATLAB test suite will be run manually. But after the maintenance work on the build server is completed, the MATLAB tests will run automatically as part of the PR review process.