ComputationalBiomechanicsLab / systems

Basic guides / scripts for CBL systems
0 stars 0 forks source link

Matlab + OpenSim crashing if model has been scaled #28

Closed itbellix closed 4 months ago

itbellix commented 1 year ago

Hi @adamkewley, I would like to report here (for future reference) the problem that I had when trying to work with CBL1. I am using the Matlab (R2022a, if I am not mistaken) to access the OpenSim API (OpenSim version 4.3). However, when I load the model I need to work on, Matlab crashed giving an error that you identified as related to scaling:

"the dump file you have attached shows an example of the bug. SimTK tries to do something like "Blas Copy" (line [8]) when it is computing something related to scaling ([11]), but the SimTK library (located at /usr/local/lib/libSimTKmath.so.3.8) calls into MATLAB' MKL library (/usr/local/MATLAB/R2022a/bin/glnxa64/mkl.so) rather than the one it's packaged with (/lib/x86_64-linux-gnu/libf77blas.so.3)."

I tried to load a different model, which corresponds to the original "unscaled" version of the one I was having problems with, and everything worked fine. So, I fear that the CBL1 might not be able to work with models that have been scaled, but I don't understand how this is possible since the model itself (even if resulting from a scaling step) contains all the information required to describe it... My doubt is therefore:

In the .osim file defining the model, all the information should be there once it has been scaled (in terms of scaling factors, and in general dimensions and properties of the bodies, muscles and joints). Is there a way to bypass this error by disabling the call to anything related to scaling, or does OpenSim really need to call those functions when loading a model that has been obtained as a result of a scaling procedure?

Do you think there might be a easier way around this rather than recompiling OpenSim or statically compiling it (you had mentioned these two options by email)?

Maybe also @aseth1 could help here

adamkewley commented 1 year ago

Your problem isn't just scaling: it's potentially anything in OpenSim or Simbody that uses libBLAS/libLAPACK.

Even if the MATLAB appears to be working with your scaled model, OpenSim may still be relying on undefined behavior to do other things. In that sense, you got "lucky", because the bug manifested as a runtime segfault, rather than producing undefined data.

adamkewley commented 1 year ago

And statically compiling libBLAS/libLAPACK only fixes part of the problem. This is because the same issue (of OpenSim being built against one library, but using another with MATLAB) also exists for the C++ standard library, where OpenSim will use the system's C++ standard library (i.e. the one that comes with Linux) whereas MATLAB comes with its own, and will force libraries to use that one.

See: https://github.com/opensim-org/opensim-core/issues/1397

I now have a build of OpenSim that statically compiles openblas into Simbody so that the dependency chain for OpenSim is now:

opensim-cmd => ./opensim-cmd (interpreter => /lib64/ld-linux-x86-64.so.2)
    libosimMoco.so => /home/adam/Desktop/opensim-build/libosimMoco.sobuild/libosimMoco.so
        libSimTKmath.so.3.8 => /home/adam/Desktop/opensim-deps-install/simbody/lib/libSimTKmath.so.3.8
            libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
                ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
    libosimTools.so => /home/adam/Desktop/opensim-build/libosimTools.so
    libosimAnalyses.so => /home/adam/Desktop/opensim-build/libosimAnalyses.so
    libosimActuators.so => /home/adam/Desktop/opensim-build/libosimActuators.so
    libosimSimulation.so => /home/adam/Desktop/opensim-build/libosimSimulation.so
        libosimLepton.so => /home/adam/Desktop/opensim-build/libosimLepton.so
    libosimCommon.so => /home/adam/Desktop/opensim-build/libosimCommon.so
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
    libSimTKsimbody.so.3.8 => /home/adam/Desktop/opensim-deps-install/simbody/lib/libSimTKsimbody.so.3.8
    libSimTKcommon.so.3.8 => /home/adam/Desktop/opensim-deps-install/simbody/lib/libSimTKcommon.so.3.8
    libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6

But libstdc++.so.6 may now be the problem, because linking the OpenSim library within MATLAB may cause it to be substituted with MATLAB's library which, again, would cause runtime faults (because OpenSim was compiled against a different library)

adamkewley commented 1 year ago

@itbellix the opensim-core installed on the CBL machine is now a statically-compiled version, can you try running your MATLAB code on it to see what hte next issue is? ;)

itbellix commented 1 year ago

Hi @adamkewley thank you very much for the work! I think I will not be able to try today as I don't have my other laptop with me (the TUD one has issues connecting to the CBL machine...) but I will let you know by Monday if it is working now. Thanks again :)

cc @aseth1

itbellix commented 1 year ago

Hi Adam, I am still running in the same problem when inizializing the .osim models. Is ti possible that I need to somehow restart my session on the CBL machines for the changes to take effect? I attach here the dump file that is produced when Matlab crashes.

adamkewley commented 1 year ago

I don't think you need to restart your session, only MATLAB.

It's unusual that the dump file doesn't mention anything about OpenSim at all anymore, which might indicate that something's happened, but it's unclear what.

The MATLAB code you are using is just loading a model from disk? The dump file you attached is only showing 15/16 cores, and all cores are doing something BLAS-related (or waiting), but I can't see any information other than that.

itbellix commented 1 year ago

Hi @adamkewley, I am loading a model from disk, passing it into a function and then calling model.initSystem() there. At that stage, the error occurs. Not sure if you want to experiment yourself (I can tell you how to run the code, it is not difficult at all), I can also meet for that one of these days.

adamkewley commented 1 year ago

@itbellix I can give it a try - email me the model

itbellix commented 1 year ago

@adamkewley thank you! If you enter in my session on the CBL1, you can just open Matlab and run the main_perturbed_analysis.m script, that is located in Desktop/github/PTbot/Code/Compute Muscle Activity/Matlab/RMR solver. The script should prompt you with a selection of a model (that is already in the repository locally, I was using Desktop/gitHub/PTbotO/penSim Models/for RMR solver/perturbed_100/2kgWeight/TSM_subject_2kgWeight1.osim) and then you should select a .trc file (any of the ones in the default PTbot/ExperimentalData/Markers should work). That should be enough to replicate the issue, that for me occurred at line 92 of the function RMR_analysis.m.

Note that the branch of the repository should be plos_publication, probably it is already the correct one but worth double-checking.

Thank you!

itbellix commented 1 year ago

Hi @adamkewley, is there any update on the status of this issue? Thank you very much ☺️

adamkewley commented 1 year ago

No update - it appears to be a problem with the OpenSim API and I haven't had enough time to look into it.

adamkewley commented 1 year ago

There will be no further updates to this until upstream (opensim-core) fixes the ABI problems. If people want me to fix it then an appropriate amount of time (2d-10d) will need to be scheduled to do so.


Having a quick look this morning, here's a more minmal reproduction of the bug:

import org.opensim.modeling.*;

model_path = "/home/adam/Desktop/italo-problem/PTbot/OpenSim Models/for RMR solver/perturbed_100/2kgWeight/TSM_subject_2kgWeight1.osim"
model = Model(model_path);
model.initSystem();  % segfaults in MKL

The same code can be rewritten in terms of the OpenSim python API. The utility of the python API being that it was compiled at the same time as the MATLAB API, so it should use exactly the same native code to initialize the model's system:

import opensim.simulation
model_path = "/home/adam/Desktop/italo-problem/PTbot/OpenSim Models/for RMR solver/perturbed_100/2kgWeight/TSM_subject_2kgWeight1.osim"
model = opensim.simulation.Model(model_path)
model.initSystem()  # no segfault

This "identical" code doesn't segfault, which indicates that the problem is specifically in how OpenSim's MATLAB bindings work in Linux (as suspected).

The same bug has been reported in the past by other OpenSim users trying to use MATLAB in Linux:

I have already investigated this bug several times. I imagine that the only way to fix it is to compile a specialized OpenSim. I have already tried that for libBLAS/libLAPACK (above), it still didn't work (and it caused problems with our python installation for Bofan, so I reverted it).

The fact that fixing the libBLAS/libLAPACK dependencies isn't fixing things might be because it just pushes the error towards the libstdc++mismatch (also known: https://github.com/opensim-org/opensim-core/issues/1397), but comprehensively fixing that problem may involve entirely changing how OpenSim itself is compiled (it is very messy to statically compile libstdc++ easily into multiple shared libraries), or relying on LD_PRELOAD tricks that may ultimately just produce a different flavor of the same (ABI mismatch) segfault.

itbellix commented 1 year ago

Hi @adamkewley, thank you for your answer! So, it is encouraging that the same code in Python doesn't break. This might be an option I can choose, but I would need to rewrite some code I guess. So for now I think there is no rush, I will let you know if things change.

Just to understand, when you say:

There will be no further updates to this until upstream (opensim-core) fixes the ABI problems.

we are probably talking about more than a year, right?

adamkewley commented 1 year ago

@itbellix how long it takes is entirely dependent on how many resources are thrown at it.

But, if we are assuming no changes in that regard, then "more than a year" is fair (it was initially spotted in 2016).

adamkewley commented 4 months ago

Closing because MATLAB has not been fixed for OpenSim, so the official ruling is "don't use MATLAB with OpenSim on Linux/Mac"