Open pscrozi opened 1 year ago
Seeing this fun mpi error when trying to run openfast
[psakiev@attaway-login8 wind_speed_8.0]$ openfastcpp inp.yaml
attaway-login8.231551map_hfi_mem: mmap of rcvhdr_bufbase (0xdabbad0004030000) size 262144 failed: Resource temporarily unavailable
attaway-login8.231551openfastcpp: An unrecoverable error occurred while communicating with the driver
[attaway-login8:231551] *** Process received signal ***
[attaway-login8:231551] Signal: Aborted (6)
[attaway-login8:231551] Signal code: (-6)
[attaway-login8:231551] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7ffff3746630]
[attaway-login8:231551] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7ffff339f387]
[attaway-login8:231551] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7ffff33a0a78]
[attaway-login8:231551] [ 3] /usr/lib64/libpsm2.so.2(+0x486e8)[0x7fffe41616e8]
[attaway-login8:231551] [ 4] /usr/lib64/libpsm2.so.2(+0xedf5)[0x7fffe4127df5]
[attaway-login8:231551] [ 5] /usr/lib64/libpsm2.so.2(+0xffc1)[0x7fffe4128fc1]
[attaway-login8:231551] [ 6] /usr/lib64/libpsm2.so.2(+0x12b99)[0x7fffe412bb99]
[attaway-login8:231551] [ 7] /usr/lib64/libpsm2.so.2(psm2_ep_open+0x367)[0x7fffe412cc77]
[attaway-login8:231551] [ 8] /opt/openmpi/4.0/intel/lib/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_module_init+0x199)[0x7fffe4384119]
[attaway-login8:231551] [ 9] /opt/openmpi/4.0/intel/lib/openmpi/mca_mtl_psm2.so(+0x38cc)[0x7fffe43848cc]
[attaway-login8:231551] [10] /opt/openmpi/4.0/intel/lib/libmpi.so.40(ompi_mtl_base_select+0x9e)[0x7ffff423468e]
[attaway-login8:231551] [11] /opt/openmpi/4.0/intel/lib/openmpi/mca_pml_cm.so(+0x61be)[0x7fffe49a71be]
[attaway-login8:231551] [12] /opt/openmpi/4.0/intel/lib/libmpi.so.40(mca_pml_base_select+0x19f)[0x7ffff423beaf]
[attaway-login8:231551] [13] /opt/openmpi/4.0/intel/lib/libmpi.so.40(ompi_mpi_init+0x8b4)[0x7ffff4249dc4]
[attaway-login8:231551] [14] /opt/openmpi/4.0/intel/lib/libmpi.so.40(MPI_Init+0x16a)[0x7ffff41f401a]
[attaway-login8:231551] [15] openfastcpp(main+0xd2)[0x40f8f2]
[attaway-login8:231551] [16] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff338b555]
[attaway-login8:231551] [17] openfastcpp[0x40f769]
[attaway-login8:231551] *** End of error message ***
Aborted
[psakiev@attaway-login8 wind_speed_8.0]$ module list
UPDATE: this is resolved when I run on a compute node.
We're all in a holding pattern right now, waiting for resolution of the mpi error above.
when I run standard openfast I see this. Any comments @gantech?
[psakiev@attaway-login8 wind_speed_8.0]$ openfast IEA-15-240-RWT-Monopile.fst
**************************************************************************************************
OpenFAST
Copyright (C) 2023 National Renewable Energy Laboratory
Copyright (C) 2023 Envision Energy USA LTD
This program is licensed under Apache License Version 2.0 and comes with ABSOLUTELY NO WARRANTY.
See the "LICENSE" file distributed with this software for details.
**************************************************************************************************
OpenFAST--128-NOTFOUND
Compile Info:
- Compiler: Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R)
64, Version 2021.3.0 Build 20210609_000000
- Architecture: 64 bit
- Precision: double
- OpenMP: No
- Date: Oct 03 2023
- Time: 20:24:54
Execution Info:
- Date: 10/04/2023
- Time: 08:06:16-0600
OpenFAST input file heading:
IEA 15 MW offshore reference model monopile configuration
Running ElastoDyn.
Nodal outputs section of ElastoDyn input file not found or improperly formatted.
Running AeroDyn.
AD15 Nodal Outputs: Nodal output section of AeroDyn input file not found or improperly formatted.
Skipping nodal outputs.
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 1, blade 1)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 2, blade 1)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 4, blade 1)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 5, blade 1)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 1, blade 2)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 2, blade 2)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 4, blade 2)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 5, blade 2)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 1, blade 3)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 2, blade 3)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 4, blade 3)
Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 5, blade 3)
Running ServoDyn.
Running ServoDyn Interface for Bladed Controllers (using Intel Fortran for Linux, ).
Using legacy Bladed DLL interface.
FAST_InitializeAll:InitModuleMappings:ED_P_2_ExtLd_P_H: Both meshes must be committed before they
can be mapped.
OpenFAST encountered an error during module initialization.
Simulation error level: FATAL ERROR
Aborting OpenFAST.
Phil got it going on the Sandia machines yesterday (10/4). Has a job in the queue. Hasn't been assigned a start date. Need to follow up with HPC about priority. The hole cut looks good and should work. It is running very fast on 8 nodes (3-4 s per timestep). Should be able to get every single run done in a single submission. Would like to get one completed to make sure everything is behaving well. As soon as we're comfortable, we should be able to batch submit.
Nalu-Wind is running too fast. Is it even solving something. Max-iterations = 0. Phil will update for the submitted job.
One person will submit the jobs.
Still waiting to hear from HPC about priority. Phil will e-mail later today if needed.
We got priority at Sandia. Some of the jobs are now running, mostly by Phil and Neil.
Nate will follow up with Phil today (10/6) and see where we need to do.
A little stalled as of 10/9. One of Phil's job's got through, but crashed, right when Eagle came back up. Couldn't get IEA 15 MW to not diverge after 80 timesteps or so. We need to make sure the model is running well.
We'll see which one gets done faster: Sandia's or NREL's when Eagle is back.
Runs we will submit at Sandia: (1) IEA 15 MW power curve (10 separate runs). Nate, Neil, Phil, Kevin. (2) NREL 5 MW case. Nate, Neil, Phil, Kevin.
Neil's notes: The full FSI runs with the split mesh on Skybridge and Chama are hitting some kind of mpi initialization error. It happens before amr-wind and nalu get a chance to write log files, so it seems to be a driver issue. AMR-wind alone runs fine, so it's either nalu or the driver. Could be a spack configuration issue on our machines. Waiting to see if Phil has any ideas. Phil was able to recreate the build issues I encountered on Attaway. I'm also working on the FSI input documentation. Should be able to put up a PR later this week. Phil: working on this this morning (10/4).