OpenModelica / OpenModelica

OpenModelica is an open-source Modelica-based modeling and simulation environment intended for industrial and academic usage.
https://openmodelica.org
Other
779 stars 297 forks source link

Building models using external code fail because of missing shared library #10344

Open casella opened 1 year ago

casella commented 1 year ago

The regression report between 2023-02-26 05:19:20 and 2023-03-01 17:50:55 shows, among other things, that models in the three Buildings library that we monitor, which are using some kind of external code, started to fail. This is independent from the update to Buildings master, because it also happens with released maintenance versions, that have not changed in that period (see e.g. 9.1.x branch)

The reason is always the same:

Regular simulation: ./Buildings_9_Buildings.ThermalZones.EnergyPlus_9_6_0.BaseClasses.Validation.FMUZoneAdapterZones1  -abortSlowSimulation -alarm=480  -emit_protected -lv LOG_STATS
stdout            | info    | 0.000 FMUZoneAdapterZones1.building: Using pre-compiled FMU /home/hudson/saved_omc/libraries/.openmodelica/libraries/Buildings 9.1.1-maint.9.1.x/Resources/src/ThermalZones/EnergyPlus_9_6_0/FMUs/Zones1.fmu
[FATAL][FMICAPI] Could not load the FMU binary: libgfortran.so.4: cannot open shared object file: No such file or directory
assert            | debug   | Could not create the DLL loading mechanism (C-API) for /var/lib/jenkins/ws/OpenModelicaLibraryTestingWork/OpenModelicaLibraryTesting/Buildings_9_Buildings.ThermalZones.EnergyPlus_9_6_0.BaseClasses.Validation.FMUZoneAdapterZones1/spawn-FMUZoneAdapterZones1/EnergyPlus.fmu.

Unfortunately there are several commits in that regression report, none seems to be the obvious candidate.

@perost, @mahge, would you mind having a look? We should strive to improve the success ratio of Buildings, not to get it worse. It would be good to get this fixed for 1.21.0.

Thanks!

perost commented 1 year ago

I assume this is because @sjoelund updated the OS on the machines and not because of any changes to OM.

sjoelund commented 1 year ago

The version of libgfortran available on Ubuntu is now 5. This is an FMU provided inside the Resources directory of Buildings that is not statically linked or providing the gfortran shared object

casella commented 1 year ago

The version of libgfortran available on Ubuntu is now 5. This is an FMU provided inside the Resources directory of Buildings that is not statically linked or providing the gfortran shared object

@mwetter do you think you can fix this issue upstream in the Buildings source code?

Thanks!

mwetter commented 1 year ago

@sjoelund @casella : Let's try this fix: https://github.com/lbl-srg/modelica-buildings/pull/3293 However, OMEdit reports when I try to simulate the model

[FATAL][FMICAPI] Could not load the FMU binary: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/Zones1.so) 

This looks like it tries to load libc.so.6 from the system path and not from binaries/linux64/.

casella commented 1 year ago

@mwetter your PR lbl-srg/modelica-buildings#3293 fixed the problem on our CI for the master branch test, see also the positive regression report.

  1. I can't replicate the issue you reported with libc.so.6, can you please report more details about the setup you use when you get that issue?
  2. can you port your PR to the 9.1.x (and possibly 8.1.x) branches, so we improve their success ratio as well?

Thanks!

casella commented 1 year ago

@mwetter regarding your issue:

This looks like it tries to load libc.so.6 from the system path and not from binaries/linux64/.

I'm not totally sure who "it" refers to. If it is some external compiled code in Buildings, I'm not sure what we can do about it, if anything.

Could you please elaborate about who is "it" and how would it know that it has to look for libraries in binaries/linux64/?

Thanks!

sjoelund commented 1 year ago

This looks like it tries to load libc.so.6 from the system path and not from binaries/linux64/.

It is your FMICAPI that decides what to load. But libc.so.6 will already be loaded inside of the simulation executable after all, so the default flags would re-use that object when loading the FMU.

For me, it works (but I have Ubuntu 22.04, which has an even more recent libc than the included one)

mwetter commented 1 year ago

I run this test on Ubuntu 20.04.5. It refers to OMEdit (or its backend) as I see the error message in OMEdit. I don't know whether the message originates in OpenModelica or in the FMU and OpenModelica just forwards the message to the OMEdit console. In my system, I do have the library installed:

mwetter@srg-mw:~$ file /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libc.so.6: symbolic link to libc-2.31.so
mwetter@srg-mw:~$ file /lib/x86_64-linux-gnu/libc-2.31.so 
/lib/x86_64-linux-gnu/libc-2.31.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=1878e6b475720c7c51969e69ab2d276fae6d1dee, for GNU/Linux 3.2.0, stripped

The log in OMEdit is as follows:

/tmp/OpenModelica_mwetter/OMEdit/Buildings.ThermalZones.EnergyPlus_9_6_0.BaseClasses.Validation.FMUZoneAdapterZones1/FMUZoneAdapterZones1 -port=46581 -logFormat=xmltcp -override=startTime=0,stopTime=3600,stepSize=7.2,tolerance=1e-06,solver=dassl,outputFormat=mat,variableFilter=.* -r=/tmp/OpenModelica_mwetter/OMEdit/Buildings.ThermalZones.EnergyPlus_9_6_0.BaseClasses.Validation.FMUZoneAdapterZones1/FMUZoneAdapterZones1_res.mat -w -lv=LOG_STATS -inputPath=/tmp/OpenModelica_mwetter/OMEdit/Buildings.ThermalZones.EnergyPlus_9_6_0.BaseClasses.Validation.FMUZoneAdapterZones1 -outputPath=/tmp/OpenModelica_mwetter/OMEdit/Buildings.ThermalZones.EnergyPlus_9_6_0.BaseClasses.Validation.FMUZoneAdapterZones1
0.000 FMUZoneAdapterZones1.building: Using pre-compiled FMU /home/mwetter/proj/ldrd/bie/modeling/github/lbl-srg/modelica-buildings/Buildings/Resources/src/ThermalZones/EnergyPlus_9_6_0/FMUs/Zones1.fmu 
Could not create the DLL loading mechanism (C-API) for /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/EnergyPlus.fmu.
Non-Linear Solver try to handle a problem with a called assert.
Value reference is not set for FMUZoneAdapterZones1.fmuZonCor. For Dymola 2020x, make sure you set 'Hidden.AvoidDoubleComputation=true'. See Buildings.ThermalZones.EnergyPlus.UsersGuide.
...
While solving non-linear system an assertion failed during initialization.
The non-linear solver tries to solve the problem that could take some time.
It could help to provide better start-values for the iteration variables.
For more information simulate with -lv LOG_NLS_V
Value reference is not set for FMUZoneAdapterZones1.fmuZonCor. For Dymola 2020x, make sure you set 'Hidden.AvoidDoubleComputation=true'. See Buildings.ThermalZones.EnergyPlus.UsersGuide.
...
Value reference is not set for FMUZoneAdapterZones1.fmuZonCor. For Dymola 2020x, make sure you set 'Hidden.AvoidDoubleComputation=true'. See Buildings.ThermalZones.EnergyPlus.UsersGuide.
nonlinear system 14 fails: at t=0
proper start-values for some of the following iteration variables might help
[1] Real building.synchronization_done(start=0, nominal=1)
Solving non-linear system 14 failed at time=0. For more information please use -lv LOG_NLS.
simulation terminated by an assertion at initialization
[FATAL][FMICAPI] Could not load the FMU binary: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/Zones1.so) 
casella commented 1 year ago

@mwetter, the error message starts with

[FATAL][FMICAPI] Could not load the FMU binary:

I grep'd FMICAP on the entire C source code base of OMC but couldn't find it, so I guess that's a message created by the external code that is under your responsibility. Of course it's output to stdout or stderr, and then rerouted to OMEdit's output window, but we need more details about how this code tries to locate the dlls in order to fix the problem.

bilderbuchi commented 1 year ago

Could this not come from https://github.com/OpenModelica/OMCompiler-3rdParty/tree/master/FMIL/src/CAPI ? (E.g. https://github.com/OpenModelica/OMCompiler-3rdParty/blob/0032be75bdd0605999816064de33830c9fe9d3eb/FMIL/src/CAPI/src/FMI2/fmi2_capi_impl.h#L31)

casella commented 1 year ago

Indeed :) I should have grep'd .h files as well. Or grep everything, just in case.

@AnHeuermann, @arun3688, any suggestion from your side on this issue?

sjoelund commented 1 year ago

Could this not come from https://github.com/OpenModelica/OMCompiler-3rdParty/tree/master/FMIL/src/CAPI ? (E.g. https://github.com/OpenModelica/OMCompiler-3rdParty/blob/0032be75bdd0605999816064de33830c9fe9d3eb/FMIL/src/CAPI/src/FMI2/fmi2_capi_impl.h#L31)

It shouldn't. We don't link to that for simulation unless the deprecated FMI import functionality is used. It's coming from: https://github.com/lbl-srg/modelica-buildings/blob/master/Buildings/Resources/Library/linux64/libfmilib_shared.so

(mostly the same library except we do have some custom changes for loading shared objects)

casella commented 1 year ago

FMI import is not deprecated, it's just limited in functionality. And, currently broken because of yet-unknown reasons, see #5345. But we need that. However, that's another story.

AnHeuermann commented 1 year ago

As far as I can tell the FMUs comes from Optimica.

<fmiModelDescription fmiVersion="2.0" modelName="Zones1" guid="599c8028b3a5bfc7e6262b3d5b87276a" generationTool="Optimica Compiler Toolkit" generationDateAndTime="2021-12-13T08:44:01" variableNamingConvention="structured" numberOfEventIndicators="0">

For OpenModelica FMUs it is possible to include all runtime dependencies into the FMU binaries directory to prevent issues like these. So I would say this is a problem of the exporting tool.

Checking the link dependecies of the FMU bianry I can see, that it links to my systems libs.

$ ldd Zones1.so 
        linux-vdso.so.1 (0x00007ffeec3f1000)
        libgfortran.so.4 => not found
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fad72b3c000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fad72719000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fad72b1c000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fad72b17000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fad724f1000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fad72b55000)

One solution could be, to manually copy system libraries into the FMU when generating them and to update the runtime path to be $ORIGIN. Or cross-compile for the specific target (whatever OS we run in our CI).

casella commented 1 year ago

FMI is great because it frees you from the intricacies of Modelica and offer a turnkey solution for system simulation. Except, you sometimes stumble into these extremely low-level issues 😅

For example, you need to know how to run ldd. On linux. And then something like that on Windows.

mwetter commented 1 year ago

@AnHeuermann : What do you mean by "update the runtime path to be $ORIGIN"? Is this something on the OpenModelica side? Note that copying libc.so.6 into all possible locations where it may be searched in the fmu does not help. I locally tried this, resulting in the fmu content shown below, but this gives the same error message as before.

$ unzip -vl Zones1.fmu 
Archive:  Zones1.fmu
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
   16206  Defl:N     2534  84% 2023-03-23 22:05 902a978b  modelDescription.xml
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  binaries/
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  binaries/linux64/
 2099848  Defl:N   792124  62% 2023-03-23 22:05 bcc57fd1  binaries/linux64/Zones1.so
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  resources/
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  documentation/
 2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  binaries/linux64/libc.so.6
  940560  Stored   940560   0% 2022-07-06 23:23 f471f9eb  binaries/linux64/libm.so.6
 2989432  Stored  2989432   0% 2022-05-13 11:11 d79b9b91  binaries/linux64/libgfortran.so.5
  125488  Stored   125488   0% 2022-05-13 11:11 c41cdb24  binaries/linux64/libgcc_s.so.1
  289592  Stored   289592   0% 2022-05-13 11:11 cd355e2b  binaries/linux64/libquadmath.so.0
 2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  lib/x86_64-linux-gnu/libc.so.6
 2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  binaries/linux64/lib/x86_64-linux-gnu/libc.so.6
--------          -------  ---                            -------
13110038         11788650  10%                            13 files
AnHeuermann commented 1 year ago

FMU Zones1.fmu is used at some point in the simulation process. In a nutshell a FMU is just a pre-compiled shared library. In this case it is binaries/linux64/Zones1.so. And this dynamic library can depend on other libraries like libgfortran.so.4 which can be seen with tools like ldd or readelf:

$ ldd Zones1.so 
        libgfortran.so.4 => not found
        [...]

Now the question is: How does the program that loads this dynamic library (dlopen) know where to find libgfortran.so.4? This depends (of course) on the operating system, but in general there is a search order with default locations: https://linux.die.net/man/3/dlopen

On my system one of the default locations is /lib/x86_64-linux-gnu/. If you want to add a non-default location there are two ways to accomplish this:

  1. Define a list of locations in environment variable LD_LIBRARY_PATH. The user who wants do open the library has to do this, so this is no option in this case, especially if the needed library simply isn't installed anywhere on the system.
  2. Compile a runtime path RUNPATH into the library Zones1.so so dlopen can search in additional locations for the needed dependency.

We can check what's in RUNPATH by calling readelf:

readelf Zones1.so -a
 [...]
 0x000000000000001d (RUNPATH)            Library runpath: [$ORIGIN]
 [...]

Okay, so $ORIGIN already is inside the runtime path. This is good. This tells dlopen to search for dependencies right next to the library it is trying to open.

So nearly everything is in place! The only thing missing now is to include all runtime dependencies into the FMU/binaries/linux64/ directory when creating the FMU. But this is no trivial task. We opted to use CMake for OpenModelica FMU export to do it for us, see --fmuruntimedepends=all. For example if we take a closer look at my libc.so.6 we can see, that it has additional dependencies:

$ ldd /lib/x86_64-linux-gnu/libc.so.6
        /lib64/ld-linux-x86-64.so.2 (0x00007f3324f97000)
        linux-vdso.so.1 (0x00007ffeff5fc000)

And in other cases the library will only be a symbolic link to some other library. And of course all of this is super OS dependent. Usually it's not much of a problem on Windows, because the Windows way of loading libraries is so annoying, that everyone copies all DLLs directly next to the executable. And on OSX it's a completely different monstrosity.

FMI doesn't provide a mechanism to differentiate between different linux64 systems, so it is no option to compile two FMU's e.g. inside Ubuntu 20.04 and Ubuntu 22.04 Docker images and use the right one on the target system.


So to conclude I see three options:

  1. The exporting tool of Zones1.so needs to handle system dependencies and copy them into the FMU at generation time.
  2. You manually add all system dependencies into the FMU after it is generated.
  3. Tell Buildings users that it needs very specific libraries and that they need to have them. But installing older libraries on a new OS can be... challenging, if possible at all.
bilderbuchi commented 1 year ago

Regarding the so.5 vs. so.6 -- I thought that (at least glibc) has wide backwards compatibility, so if a newer library is present on the system (e.g. libc.so.6) that still works even if libc.so.5 was expected? I seem to remember (but can't dig up a source atm) that the suggestion was to compile on the oldest system one wants this to work on, and system libs on that+newer systems will "just work" (hah! :D) ? Is that restricted to libc, or is a similar thing valid for libgfortran, so that an FMU packaged on, say, Ubuntu 20.04, will also work on 22.04? Or is that a red herring, and it's already "working" as intended anyway, if the FMU includes the right constellation of libraries, or the problem lies elsewhere?

sjoelund commented 1 year ago

libc.so.6 is not backwards compatible with libc.so.5, which is another libc implementation and not glibc. glibc has been backwards compatible since 2.0 (1997). It looked for GLIBC_2.34 inside of libc.so.6 before, which means the FMU would only work on systems with the same or newer glibc (Ubuntu 22.04 comes with 2.35 and 20.04 comes with 2.31). It is not forward compatible.

The problem with libgfortran is different, because it is not backwards compatible and Ubuntu only ships with one version of it. So the inclusion of libgfortran in the FMU worked for me on Ubuntu 22.04. But it would not work in Ubuntu 20.04, which comes with its own older libgfortran version (because the libgfortran in the FMU was compiled against a newer glibc). A statically linked libgfortran compiled on a system with an old glibc produces an FMU that works on the most systems, but is often really annoying to do in practice.

casella commented 1 year ago

As I wrote:

FMI is great because it frees you from the intricacies of Modelica and offer a turnkey solution for system simulation. Except, you sometimes stumble into these extremely low-level issues 😅

I wonder how much the Modelica community at large is aware of these potential issues. Sometimes FMI is seen as a silver bullet, which of course it isn't.

casella commented 1 year ago

Maybe we should write a paper "FMI compatibility: lessons learned" for the next Modelica conference where we discuss these issues?

mahge commented 1 year ago

@AnHeuermann : What do you mean by "update the runtime path to be $ORIGIN"? Is this something on the OpenModelica side? Note that copying libc.so.6 into all possible locations where it may be searched in the fmu does not help. I locally tried this, resulting in the fmu content shown below, but this gives the same error message as before.

$ unzip -vl Zones1.fmu 
Archive:  Zones1.fmu
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
   16206  Defl:N     2534  84% 2023-03-23 22:05 902a978b  modelDescription.xml
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  binaries/
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  binaries/linux64/
 2099848  Defl:N   792124  62% 2023-03-23 22:05 bcc57fd1  binaries/linux64/Zones1.so
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  resources/
       0  Defl:N        2   0% 2023-03-23 22:05 00000000  documentation/
 2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  binaries/linux64/libc.so.6
  940560  Stored   940560   0% 2022-07-06 23:23 f471f9eb  binaries/linux64/libm.so.6
 2989432  Stored  2989432   0% 2022-05-13 11:11 d79b9b91  binaries/linux64/libgfortran.so.5
  125488  Stored   125488   0% 2022-05-13 11:11 c41cdb24  binaries/linux64/libgcc_s.so.1
  289592  Stored   289592   0% 2022-05-13 11:11 cd355e2b  binaries/linux64/libquadmath.so.0
 2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  lib/x86_64-linux-gnu/libc.so.6
 2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  binaries/linux64/lib/x86_64-linux-gnu/libc.so.6
--------          -------  ---                            -------
13110038         11788650  10%                            13 files

@mwetter Perhaps you can try updating your LD_LIBRARY_PATH environment variable to include the current directory.

> export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
> ./simulationExecutable

I do not think these two are needed or would help at all.

>  2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  lib/x86_64-linux-gnu/libc.so.6
>  2216304  Stored  2216304   0% 2022-07-06 23:23 be453c56  binaries/linux64/lib/x86_64-linux-gnu/libc.so.6
mwetter commented 1 year ago

@mahge Exporting the LD_LIBRARY_PATH does not change the error message. I don't understand why OpenModelica reports

[FATAL][FMICAPI] Could not load the FMU binary: /lib/x86_64-linux-gnu/libc.so.6: version 
`GLIBC_2.34' not found (required by
 /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/Zones1.so) 

but ldd finds /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/libc.so.6 that is shipped with the FMU:

mwetter@srg-mw:/tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1$ ldd binaries/linux64/Zones1.so
    linux-vdso.so.1 (0x00007ffd0b3b4000)
    libgfortran.so.5 => /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/libgfortran.so.5 (0x00007f3930fdb000)
    libm.so.6 => /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/libm.so.6 (0x00007f3930ef4000)
    libgcc_s.so.1 => /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/libgcc_s.so.1 (0x00007f3930ed4000)
    libc.so.6 => /tmp/OpenModelica_mwetter/OMEdit/spawn-FMUZoneAdapterZones1/binaries/linux64/libc.so.6 (0x00007f3930cac000)
    libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f3930c40000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f393138b000)

This looks like FMICAPI does not allow to get the version of the FMU, which maybe it should as a fall back if there is no compatible version installed on the target system?

casella commented 1 year ago

The backport was successful for three models, see regression report.

However, models in Buildings 9.1.x's two sub-packages Buildings.ThermalZones.EnergyPlus_9_6_0.Examples and Buildings.ThermalZones.EnergyPlus_9_6_0.Validation are now failing with this message

assert            | debug   | Failed to find spawn executable in Buildings Library installation, on SPAWNPATH and on PATH. 
See installation instructions at Buildings.ThermalZones.EnergyPlus_9_6_0.UsersGuide.Installation

I'm not really sure what is wrong now, @mwetter do you have any idea?

mwetter commented 1 year ago

These packages are not affected by the merge in https://github.com/lbl-srg/modelica-buildings/pull/3308/files

I don't know why they suddenly fail. The error message indicates that the spawn binaries are not accessible.

Since

Buildings.Experimental.DHC.Loads.BaseClasses.Examples.CouplingSpawnZ1
Buildings.Experimental.DHC.Loads.BaseClasses.Examples.CouplingSpawnZ6

also fail, at least all that require the spawn binaries are failing. However, I just tried and can successfully download the binary which did not change:

$ wget https://github.com/NREL/EnergyPlus/releases/download/v9.6.0/EnergyPlus-9.6.0-f420c06a69-Linux-Ubuntu20.04-x86_64.tar.gz
$ md5sum EnergyPlus-9.6.0-f420c06a69-Linux-Ubuntu20.04-x86_64.tar.gz
2b5d6c6871d258d3b843e80c79e2cee2  EnergyPlus-9.6.0-f420c06a69-Linux-Ubuntu20.04-x86_64.tar.gz
casella commented 1 year ago

@mahge @sjoelund could you please check?