firemodels / fds

Fire Dynamics Simulator
https://pages.nist.gov/fds-smv/
Other
674 stars 627 forks source link

Dynamic Link Error with OpenMPI on MacPro FDS 5.3 #611

Closed gforney closed 9 years ago

gforney commented 9 years ago
Please complete the following lines...

Application Version: 5.3.0
SVN Revision Number:
Compile Date:
Operating System:10.5.6

Describe details of the issue below: See Terminal Output text below.

Last login: Tue Jan 27 22:13:49 on console
SuperNova:~ jbrooker$ cd /Volumes/SuperNovaHD2/Data/Orion/RevDMac
SuperNova:RevDMac jbrooker$ mpiexec -np 4 /Applications/NIST/FDS/fds_5.3.0_mpi_osx_64

/Volumes/SuperNovaHD2/Data/Orion/RevDMac/RevD_test2.fds
dyld: lazy symbol binding failed: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: lazy symbol binding failed: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: lazy symbol binding failed: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: lazy symbol binding failed: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

dyld: Symbol not found: __intel_fast_memcpy
  Referenced from: /usr/local/lib/libmpi.0.dylib
  Expected in: flat namespace

[SuperNova.local:12324] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c

at line 275
[SuperNova.local:12324] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1158
[SuperNova.local:12324] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line
90
mpiexec noticed that job rank 1 with PID 12328 on node SuperNova.local exited on signal
5 
(Trace/BPT trap). 
1 additional process aborted (not shown)
[SuperNova.local:12324] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c

at line 188
[SuperNova.local:12324] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1190
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons for this job. Returned value Timeout

instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
SuperNova:RevDMac jbrooker$ 

Original issue reported on code.google.com by john.e.brooker@nasa.gov on 2009-02-05 16:56:44

gforney commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by bryan%solvercore.com@gtempaccount.com on 2009-02-05 16:57:56

gforney commented 9 years ago
Do you know what version of the OpenMPI package you installed?

Intel support wanted to make sure we were using the same MPI libraries, can you check
and see what version is installed on your system now?

So far they did not give me any real information to go on, but I replied back to
their initial response and hopefully they will have enough info now to help resolve
this.

Below is their response followed by my reply...

Dear Bryan,
Thank you for submitting the issue. Regarding this from your issue tracker:
>>>dyld: Symbol not found: __intel_fast_memcpy
Referenced from: /usr/local/lib/libmpi.0.dylib

Perhaps the version of libmpi on the client is different than the version on the
build machine. Can you check that and let me know? This this Intel MPI on the client?
What MPI is being used on the build machine?

On the surface this doesn't look like a compiler error per se, since you have no
issue on the build machine. It appears that you are using m_cprof_p_11.0.056 on
Intel64 architecture. Has that compiler's runtime been distributed to the client,
since you are linking dynamically?

Perhaps a workaround is to build the application with -static-intel, but that's a
long shot.

Other than the above, I don't have further suggestions. I did a search of the problem
report database, and nothing like this has been reported.

Please give me some feedback.

Thank you,
Patrick
Intel Developer Support

**** My Reply ****

Thanks for the timely reply...

Both machines are using OpenMPI, but I am not sure if they are the exact same
version. I will look into that.

I am using -static-intel for the build. Here are the Fortran and C compiler flags I
am using. Maybe you will see if I am missing something.

intel_osx_mpi_64 : FFLAGS = -O3 -m64 -heap-arrays -axSSSE3 -static-intel
-L/opt/intel/Compiler/11.0/056/lib
intel_osx_mpi_64 : CFLAGS = -O3 -m64 -Dpp_noappend -Dpp_OSX

I think that I went through the process of helping this user install the runtime
libraries, but I was having a difficult time finding the install package today on the
Intel downloads site. Can you provide me with a path to get them?

I thought that by using -static-intel flag, the intel libraries should be linked into
the binary. Also, why is the mpi library calling __intel_fast_memcpy ? Where does
this function live? and what does flat namespace mean? Shouldn't this function be in
a particular library file? Or does flat namespace mean that the function should have
been built in to the binary because of the -static-intel flag?

Thank you for your help on this issue.
-Bryan

Original issue reported on code.google.com by bryan%solvercore.com@gtempaccount.com on 2009-02-05 22:16:24

gforney commented 9 years ago
This is what I get from running ompi_info:

Last login: Thu Feb  5 11:20:03 on ttys000
/usr/local/openmpi/bin/ompi_info ; exit;
SuperNova:~ jbrooker$ /usr/local/openmpi/bin/ompi_info ; exit;
                Open MPI: 1.2.5
   Open MPI SVN revision: r16989
                Open RTE: 1.2.5
   Open RTE SVN revision: r16989
                    OPAL: 1.2.5
       OPAL SVN revision: r16989
                  Prefix: /usr/local/openmpi
 Configured architecture: i386-apple-darwin9.1.0
           Configured by: bwklein
           Configured on: Fri Jan 11 13:34:03 EST 2008
          Configure host: devi1.nist.gov
                Built by: bwklein
                Built on: Fri Jan 11 13:42:05 EST 2008
              Built host: devi1.nist.gov
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (single underscore)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/local/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/local/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: no
           MCA backtrace: darwin (MCA v1.0, API v1.0, Component v1.2.5)
              MCA memory: darwin (MCA v1.0, API v1.0, Component v1.2.5)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5)
               MCA timer: darwin (MCA v1.0, API v1.0, Component v1.2.5)
         MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5)
         MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5)
               MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA ras: xgrid (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.5)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA pls: xgrid (MCA v1.0, API v1.3, Component v1.2.5)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.5)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.5)
logout

[Process completed]

Original issue reported on code.google.com by john.e.brooker@nasa.gov on 2009-02-07 05:43:56

gforney commented 9 years ago
What is the status of this thread?

Original issue reported on code.google.com by mcgratta on 2009-03-13 15:14:56

gforney commented 9 years ago
Has this problem been resolved?

Original issue reported on code.google.com by mcgratta on 2009-04-13 20:36:12

gforney commented 9 years ago
Please provide an updated binary (fds_5.3.1_mpi_osx64) and I will try again.

Original issue reported on code.google.com by john.e.brooker@nasa.gov on 2009-04-15 14:33:35

gforney commented 9 years ago
What is the status of this thread?

Original issue reported on code.google.com by mcgratta on 2009-06-11 16:06:59

gforney commented 9 years ago
John -- were you ever able to get your case to run?

Original issue reported on code.google.com by mcgratta on 2009-07-23 19:44:43

gforney commented 9 years ago
No I have not. To my knowledge, an fds_mpi_osx64 binary does not exist. Please
correct me if I am mistaken.

Original issue reported on code.google.com by john.e.brooker@nasa.gov on 2009-07-23 20:34:13

gforney commented 9 years ago
You may be right -- Bryan, is there such a thing?

Original issue reported on code.google.com by mcgratta on 2009-07-23 20:42:21

gforney commented 9 years ago
bryan, it doesn't look like the 64 bit mpi libraries are installed on devi1.  If they
are there needs to be a way in the makefile of linking to the right set (32 or 64
bit) of libraries.  The link errors I am getting when trying to build fds_mpi_osx_64
on devi1 are consistent with linking to the 32 bit instead of 64 bit mpi libraries.

Errors look similar to when I try and build on linux but link to the wrong libraries.

Original issue reported on code.google.com by gforney on 2009-07-23 21:52:29

gforney commented 9 years ago
I just built a version on devi1 without any link errors. I moved the dylib files so
that the compiler would be forced to build a fully static version of the binary. (Tip
from web)  I am uploading a test version to downloads now for John to try on his
machine.  The problem was that it would run on my machine, but was not running on his.

Original issue reported on code.google.com by bryan%solvercore.com@gtempaccount.com on 2009-07-23 22:01:23

gforney commented 9 years ago
Please try the binary found at:
http://fds-smv.googlecode.com/files/fds_5.3.1_mpi_osx_64.zip

Original issue reported on code.google.com by bryan%solvercore.com@gtempaccount.com on 2009-07-23 22:10:23

gforney commented 9 years ago
I've never used them but the Intel download site for the compilers has run-time
libraries available.  Maybe this we need to be giving these out.

Original issue reported on code.google.com by gforney on 2009-07-23 23:36:44

gforney commented 9 years ago
We tried that back in Nov. of last year.
ftp://ftp.nist.gov/pub/bfrl/bwklein/

Hopefully the file I put up there will solve the problem.

Original issue reported on code.google.com by bryan%solvercore.com@gtempaccount.com on 2009-07-24 13:53:23

gforney commented 9 years ago
John, when you have a chance could you try the newly released version of FDS (5.4) 
on your Mac and let us know if it works.

Original issue reported on code.google.com by mcgratta on 2009-09-03 14:39:41

gforney commented 9 years ago
Hi Byran, Glenn and Kevin,

I'm experiencing the same problem on our Mac Pro system (OSX 10.5.8) here at CESARE
with your current 
5.4.3 64bit binary although the 5.3.1 binary which is linked above works with the provided
1.3.3_64 libraries 
(after a bit of fiddling which I've described below), but the 5.4.1 binary results
in the following error: 

dyld: lazy symbol binding failed: Symbol not found: __intel_fast_memcpy
  Referenced from: /Applications/FDS/FDS5/bin/fds_mpi_osx_64
  Expected in: /Applications/FDS/openmpi-1.3.3_64/lib/libopen-pal.0.dylib

While attempting to fix this I also came across a couple of things which may be of
interest to you. 

Firstly, I found that the DYLD_etc environment variable on it's own caused what I can
only describe as a long 
winded segmentation fault with lots of messages beginning with "Sorry!" as openmpi
can't access it's own 
help facility, so to fix this, you need to include the OPAL_PREFIX variable pointed
to the openmpi-1.3.3_64/ 
directory to your profile and I added the LD_LIBRARY_PATH variable to openmpi-1.3.3_64/lib
for good 
measure as well after consulting some online help.

This might just be related to our system, but I also discovered that in order to use
the commands specific to 
the 1.3.3_64 (i.e. ompi_info or mpirun) you have to ./ them explicitly from the openmpi-1.3.3_64/bin

directory, regardless of what environment variables are set up in your profile, for
example ./ompi_info from 
openmpi-1.3.3_64/bin results in:

EWB4212-68:bin cesare$ ./ompi_info
                 Package: Open MPI gforney@devi1.nist.gov Distribution
                Open MPI: 1.3.3
   Open MPI SVN revision: r21666
   Open MPI release date: Jul 14, 2009
                Open RTE: 1.3.3
   Open RTE SVN revision: r21666
   Open RTE release date: Jul 14, 2009
                    OPAL: 1.3.3
       OPAL SVN revision: r21666
       OPAL release date: Jul 14, 2009
            Ident string: 1.3.3
                  Prefix: /Applications/FDS/openmpi-1.3.3_64/
 Configured architecture: i386-apple-darwin9.8.0
          Configure host: devi1.nist.gov
           Configured by: gforney
           Configured on: Thu Sep 10 20:32:04 EDT 2009
          Configure host: devi1.nist.gov
                Built by: gforney
                Built on: Thu Sep 10 20:50:30 EDT 2009
              Built host: devi1.nist.gov
(etc)

Whereas "ompi_info" on it's own gives:

EWB4212-68:bin cesare$ ompi_info
                Open MPI: 1.2.5
   Open MPI SVN revision: r16989
                Open RTE: 1.2.5
   Open RTE SVN revision: r16989
                    OPAL: 1.2.5
       OPAL SVN revision: r16989
                  Prefix: /Applications/FDS/openmpi-1.3.3_64/
 Configured architecture: i386-apple-darwin9.1.0
           Configured by: cesare
           Configured on: Tue Feb  5 14:37:21 EST 2008
          Configure host: EWB4212-68.local
                Built by: root
                Built on: Tue  5 Feb 2008 14:45:04 EST
              Built host: EWB4212-68.local
(etc)

I'm not sure if this may be an issue for others as well whose Mac's have 1.2.5 pre-installed,
but as I've 
replaced the 1.2.5 $PATH etc entries with new ones for 1.3.3_64, I am now unsure why
or where these are 
being overridden.

So, in order to get the FDS 5.3 executable to work I had to use:
./mpirun -np X fds5.3_mpi_osx_64 input_file.fds 
from the openmpi-1.3.3_64/bin directory and where both the FDS5/bin directory and the
input file locations 
are included in $PATH otherwise without the ./ I get the following:

mpirun -np 1 fds5.3_mpi_intel_osx_64 activate_vents.fds [EWB4212-68.local:82932] [NO-NAME]

ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182
[EWB4212-68:82932] *** Process received signal ***
[EWB4212-68:82932] Signal: Segmentation fault (11)
[EWB4212-68:82932] Signal code: Address not mapped (1)
[EWB4212-68:82932] Failing at address: 0xfffffff0
[ 1] [0xbfffee18, 0xfffffff0] (-P-)
[ 2] (pthread_getspecific + 0x132) [0xbffff578, 0x900ab456] 
[ 3] (_dyld_get_image_header_containing_address + 0xc8) [0xbffff5a8, 0x900e9702] 
[ 4] (opal_show_help + 0x3b3) [0xbffff618, 0x000c6303] 
[ 5] (orte_init_stage1 + 0xca) [0xbffff6f8, 0x00056eda] 
[ 6] (orte_system_init + 0x1e) [0xbffff718, 0x0005a74e] 
[ 7] (orte_init + 0x8d) [0xbffff748, 0x00056bfd] 
[ 8] (orterun + 0x181) [0xbffff7d8, 0x0000258d] 
[ 9] (main + 0x18) [0xbffff7f8, 0x0000240a] 
[10] (start + 0x36) [0xbffff814, 0x000023c6] 
[11] [0x00000000, 0x00000005] (FP-)
[EWB4212-68:82932] *** End of error message ***
Segmentation fault

Thanks for your help.

Regards,

Samara
CESARE, Victoria University, Melbourne, Australia

Original issue reported on code.google.com by samara.neilson@vu.edu.au on 2010-01-25 04:19:11

gforney commented 9 years ago
Samara, thanks for the info. You might want to consider buying the Intel Fortran 
compiler for Mac OSX. We spend a considerable amount of time and resources 
maintaining FDS and Smokeview on OSX, and yet it is a small fraction of our user 
base.

Original issue reported on code.google.com by mcgratta on 2010-01-25 13:02:51

gforney commented 9 years ago
I've posted a new MAC bundle since the above comments were made in this issue.  Do
these problems still exist?

Original issue reported on code.google.com by gforney on 2010-05-03 00:41:53

gforney commented 9 years ago
Hi Glenn,

The latest version (5.5/64) gives me the following error:

dyld: lazy symbol binding failed: Symbol not found: ___intel_sse2_strlen
  Referenced from: /Applications/FDS/FDS5/bin/fds5.5_mpi_osx_64
  Expected in: /Applications/FDS/openmpi-1.3.3_64/lib/libopen-pal.0.dylib

dyld: Symbol not found: ___intel_sse2_strlen
  Referenced from: /Applications/FDS/FDS5/bin/fds5.5_mpi_osx_64
  Expected in: /Applications/FDS/openmpi-1.3.3_64/lib/libopen-pal.0.dylib

Version 5.3 is the only one that has worked for us so far, 5.4 came up with a similar
error to the above, but 
for __intel_fast_memcpy instead of ___intel_sse2_strlen, so perhaps something that
changed from the 5.3 
bundle to the 5.5 bundle is causing the problem??

Samara 

Original issue reported on code.google.com by samara.neilson@vu.edu.au on 2010-05-03 06:17:36

gforney commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by gforney on 2011-01-21 19:20:46