GeoscienceAustralia / eqrm

Automatically exported from code.google.com/p/eqrm
Other
5 stars 4 forks source link

running in parallel on rhe-compute1 not working #97

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
I don't know what is needed to replicate this yet.

The error is;
[rhe-compute1.ga.gov.au:32137] 2 more processes have sent help message 
help-mpi-btl-base.txt / btl:no-nics

Original issue reported on code.google.com by duncan.g...@gmail.com on 16 Oct 2012 at 4:28

GoogleCodeExporter commented 9 years ago
Actually this doesn't seem to crash the simulation.  The message to console is 
coming at the end of the simulation when running a simple simulation.

Original comment by duncan.g...@gmail.com on 16 Oct 2012 at 4:33

GoogleCodeExporter commented 9 years ago
Example;
@rhe-compute1:/nas/mnh/georisk_models/earthquake/sandpits/duncan/eqrm_core/branc
hes/reduce_mem$ mpirun -np 2 python2.7 
implementation_tests/scenarios/TS_haz20.py
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
--------------------------------------------------------------------------
[[51319,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: OpenFabrics (openib)

Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list
Pypar (version 2.1.4) initialised MPI OK with 2 processors
Logfile is './implementation_tests/current/TS_haz20/log-1.txt' with logging 
level of WARNING, console logging level is WARNING
Logfile is './implementation_tests/current/TS_haz20/log-0.txt' with logging 
level of DEBUG, console logging level is INFO
JS*N{"parallel size": 2}
Logfile is './implementation_tests/current/TS_haz20/log-1.txt' with logging 
level of WARNING, console logging level is WARNING
JS*N{"host name": "XX"}
JS*N{"system platform": "linux2"}
event_set_handler = generate
P0: Generating event set
P0: Saving event set to ./implementation_tests/current/TS_haz20/newc_event_set
P0: Event set created. Number of events=1
JS*N{"len_events": 1}
P0: Sites set created. Number of sites=4
JS*N{"len_max_GMPEs": 1}
JS*N{"len_recurrence_models": 1}
JS*N{"pseudo_events": 1}
P0: do site 1 of 2
P0: do site 2 of 2
JS*N{"time_pre_site_loop_fraction": 0.9318181818181818}
event_loop_time (excluding file saving) 0:00:00.440000 hr:min:sec
JS*N{"event_loop_time_seconds": 0.44000000000000006}
On node 0, rhe-compute1.ga.gov.au clock (processor) time taken overall 
0:00:00.520000 hr:min:sec.
JS*N{"clock_time_taken_overall_seconds": 0.5200000000000001}
On node 0, rhe-compute1.ga.gov.au wall time taken overall 0:00:00.608083 
hr:min:sec.
JS*N{"wall_time_taken_overall_seconds": 0.6080830097198486}
On node 0, rhe-compute1.ga.gov.au wall time taken overall 0:00:00.608083 
hr:min:sec.
[rhe-compute1.ga.gov.au:32741] 1 more process has sent help message 
help-mpi-btl-base.txt / btl:no-nics
[rhe-compute1.ga.gov.au:32741] Set MCA parameter "orte_base_help_aggregate" to 
0 to see all help / error messages

Original comment by duncan.g...@gmail.com on 16 Oct 2012 at 4:51

GoogleCodeExporter commented 9 years ago
I'm running a big simulation now and getting the error message before 
generating the event set.

Original comment by duncan.g...@gmail.com on 16 Oct 2012 at 11:38