Closed itbellix closed 4 months ago
Your problem isn't just scaling: it's potentially anything in OpenSim or Simbody that uses libBLAS
/libLAPACK
.
Even if the MATLAB appears to be working with your scaled model, OpenSim may still be relying on undefined behavior to do other things. In that sense, you got "lucky", because the bug manifested as a runtime segfault, rather than producing undefined data.
And statically compiling libBLAS/libLAPACK only fixes part of the problem. This is because the same issue (of OpenSim being built against one library, but using another with MATLAB) also exists for the C++ standard library, where OpenSim will use the system's C++ standard library (i.e. the one that comes with Linux) whereas MATLAB comes with its own, and will force libraries to use that one.
See: https://github.com/opensim-org/opensim-core/issues/1397
I now have a build of OpenSim that statically compiles openblas
into Simbody so that the dependency chain for OpenSim is now:
opensim-cmd => ./opensim-cmd (interpreter => /lib64/ld-linux-x86-64.so.2)
libosimMoco.so => /home/adam/Desktop/opensim-build/libosimMoco.sobuild/libosimMoco.so
libSimTKmath.so.3.8 => /home/adam/Desktop/opensim-deps-install/simbody/lib/libSimTKmath.so.3.8
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2
libosimTools.so => /home/adam/Desktop/opensim-build/libosimTools.so
libosimAnalyses.so => /home/adam/Desktop/opensim-build/libosimAnalyses.so
libosimActuators.so => /home/adam/Desktop/opensim-build/libosimActuators.so
libosimSimulation.so => /home/adam/Desktop/opensim-build/libosimSimulation.so
libosimLepton.so => /home/adam/Desktop/opensim-build/libosimLepton.so
libosimCommon.so => /home/adam/Desktop/opensim-build/libosimCommon.so
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
libSimTKsimbody.so.3.8 => /home/adam/Desktop/opensim-deps-install/simbody/lib/libSimTKsimbody.so.3.8
libSimTKcommon.so.3.8 => /home/adam/Desktop/opensim-deps-install/simbody/lib/libSimTKcommon.so.3.8
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
But libstdc++.so.6
may now be the problem, because linking the OpenSim library within MATLAB may cause it to be substituted with MATLAB's library which, again, would cause runtime faults (because OpenSim was compiled against a different library)
@itbellix the opensim-core
installed on the CBL machine is now a statically-compiled version, can you try running your MATLAB code on it to see what hte next issue is? ;)
Hi @adamkewley thank you very much for the work! I think I will not be able to try today as I don't have my other laptop with me (the TUD one has issues connecting to the CBL machine...) but I will let you know by Monday if it is working now. Thanks again :)
cc @aseth1
Hi Adam, I am still running in the same problem when inizializing the .osim models. Is ti possible that I need to somehow restart my session on the CBL machines for the changes to take effect? I attach here the dump file that is produced when Matlab crashes.
I don't think you need to restart your session, only MATLAB.
It's unusual that the dump file doesn't mention anything about OpenSim at all anymore, which might indicate that something's happened, but it's unclear what.
The MATLAB code you are using is just loading a model from disk? The dump file you attached is only showing 15/16 cores, and all cores are doing something BLAS-related (or waiting), but I can't see any information other than that.
Hi @adamkewley, I am loading a model from disk, passing it into a function and then calling model.initSystem()
there. At that stage, the error occurs. Not sure if you want to experiment yourself (I can tell you how to run the code, it is not difficult at all), I can also meet for that one of these days.
@itbellix I can give it a try - email me the model
@adamkewley thank you!
If you enter in my session on the CBL1, you can just open Matlab and run the main_perturbed_analysis.m
script, that is located in Desktop/github/PTbot/Code/Compute Muscle Activity/Matlab/RMR solver
.
The script should prompt you with a selection of a model (that is already in the repository locally, I was using Desktop/gitHub/PTbotO/penSim Models/for RMR solver/perturbed_100/2kgWeight/TSM_subject_2kgWeight1.osim
) and then you should select a .trc
file (any of the ones in the default PTbot/ExperimentalData/Markers
should work).
That should be enough to replicate the issue, that for me occurred at line 92 of the function RMR_analysis.m
.
Note that the branch of the repository should be plos_publication
, probably it is already the correct one but worth double-checking.
Thank you!
Hi @adamkewley, is there any update on the status of this issue? Thank you very much ☺️
No update - it appears to be a problem with the OpenSim API and I haven't had enough time to look into it.
There will be no further updates to this until upstream (opensim-core) fixes the ABI problems. If people want me to fix it then an appropriate amount of time (2d-10d) will need to be scheduled to do so.
Having a quick look this morning, here's a more minmal reproduction of the bug:
import org.opensim.modeling.*;
model_path = "/home/adam/Desktop/italo-problem/PTbot/OpenSim Models/for RMR solver/perturbed_100/2kgWeight/TSM_subject_2kgWeight1.osim"
model = Model(model_path);
model.initSystem(); % segfaults in MKL
The same code can be rewritten in terms of the OpenSim python API. The utility of the python API being that it was compiled at the same time as the MATLAB API, so it should use exactly the same native code to initialize the model's system:
import opensim.simulation
model_path = "/home/adam/Desktop/italo-problem/PTbot/OpenSim Models/for RMR solver/perturbed_100/2kgWeight/TSM_subject_2kgWeight1.osim"
model = opensim.simulation.Model(model_path)
model.initSystem() # no segfault
This "identical" code doesn't segfault, which indicates that the problem is specifically in how OpenSim's MATLAB bindings work in Linux (as suspected).
The same bug has been reported in the past by other OpenSim users trying to use MATLAB in Linux:
I have already investigated this bug several times. I imagine that the only way to fix it is to compile a specialized OpenSim. I have already tried that for libBLAS/libLAPACK (above), it still didn't work (and it caused problems with our python installation for Bofan, so I reverted it).
The fact that fixing the libBLAS/libLAPACK dependencies isn't fixing things might be because it just pushes the error towards the libstdc++mismatch (also known: https://github.com/opensim-org/opensim-core/issues/1397), but comprehensively fixing that problem may involve entirely changing how OpenSim itself is compiled (it is very messy to statically compile libstdc++ easily into multiple shared libraries), or relying on LD_PRELOAD
tricks that may ultimately just produce a different flavor of the same (ABI mismatch) segfault.
Hi @adamkewley, thank you for your answer! So, it is encouraging that the same code in Python doesn't break. This might be an option I can choose, but I would need to rewrite some code I guess. So for now I think there is no rush, I will let you know if things change.
Just to understand, when you say:
There will be no further updates to this until upstream (opensim-core) fixes the ABI problems.
we are probably talking about more than a year, right?
@itbellix how long it takes is entirely dependent on how many resources are thrown at it.
But, if we are assuming no changes in that regard, then "more than a year" is fair (it was initially spotted in 2016).
Closing because MATLAB has not been fixed for OpenSim, so the official ruling is "don't use MATLAB with OpenSim on Linux/Mac"
Hi @adamkewley, I would like to report here (for future reference) the problem that I had when trying to work with
CBL1
. I am using the Matlab (R2022a, if I am not mistaken) to access the OpenSim API (OpenSim version 4.3). However, when I load the model I need to work on, Matlab crashed giving an error that you identified as related to scaling:I tried to load a different model, which corresponds to the original "unscaled" version of the one I was having problems with, and everything worked fine. So, I fear that the
CBL1
might not be able to work with models that have been scaled, but I don't understand how this is possible since the model itself (even if resulting from a scaling step) contains all the information required to describe it... My doubt is therefore:Do you think there might be a easier way around this rather than recompiling OpenSim or statically compiling it (you had mentioned these two options by email)?
Maybe also @aseth1 could help here