Exawind / hfm-fsi

Input files and data for HFM FSI Work
0 stars 0 forks source link

Submit jobs on Sandia machines with a request for priority. #25

Open pscrozi opened 9 months ago

pscrozi commented 9 months ago

We'll see which one gets done faster: Sandia's or NREL's when Eagle is back.

Runs we will submit at Sandia: (1) IEA 15 MW power curve (10 separate runs). Nate, Neil, Phil, Kevin. (2) NREL 5 MW case. Nate, Neil, Phil, Kevin.

Neil's notes: The full FSI runs with the split mesh on Skybridge and Chama are hitting some kind of mpi initialization error. It happens before amr-wind and nalu get a chance to write log files, so it seems to be a driver issue. AMR-wind alone runs fine, so it's either nalu or the driver. Could be a spack configuration issue on our machines. Waiting to see if Phil has any ideas. Phil was able to recreate the build issues I encountered on Attaway. I'm also working on the FSI input documentation. Should be able to put up a PR later this week. Phil: working on this this morning (10/4).

psakievich commented 9 months ago

Seeing this fun mpi error when trying to run openfast

[psakiev@attaway-login8 wind_speed_8.0]$ openfastcpp inp.yaml
attaway-login8.231551map_hfi_mem: mmap of rcvhdr_bufbase (0xdabbad0004030000) size 262144 failed: Resource temporarily unavailable
attaway-login8.231551openfastcpp: An unrecoverable error occurred while communicating with the driver
[attaway-login8:231551] *** Process received signal ***
[attaway-login8:231551] Signal: Aborted (6)
[attaway-login8:231551] Signal code:  (-6)
[attaway-login8:231551] [ 0] /usr/lib64/libpthread.so.0(+0xf630)[0x7ffff3746630]
[attaway-login8:231551] [ 1] /usr/lib64/libc.so.6(gsignal+0x37)[0x7ffff339f387]
[attaway-login8:231551] [ 2] /usr/lib64/libc.so.6(abort+0x148)[0x7ffff33a0a78]
[attaway-login8:231551] [ 3] /usr/lib64/libpsm2.so.2(+0x486e8)[0x7fffe41616e8]
[attaway-login8:231551] [ 4] /usr/lib64/libpsm2.so.2(+0xedf5)[0x7fffe4127df5]
[attaway-login8:231551] [ 5] /usr/lib64/libpsm2.so.2(+0xffc1)[0x7fffe4128fc1]
[attaway-login8:231551] [ 6] /usr/lib64/libpsm2.so.2(+0x12b99)[0x7fffe412bb99]
[attaway-login8:231551] [ 7] /usr/lib64/libpsm2.so.2(psm2_ep_open+0x367)[0x7fffe412cc77]
[attaway-login8:231551] [ 8] /opt/openmpi/4.0/intel/lib/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_module_init+0x199)[0x7fffe4384119]
[attaway-login8:231551] [ 9] /opt/openmpi/4.0/intel/lib/openmpi/mca_mtl_psm2.so(+0x38cc)[0x7fffe43848cc]
[attaway-login8:231551] [10] /opt/openmpi/4.0/intel/lib/libmpi.so.40(ompi_mtl_base_select+0x9e)[0x7ffff423468e]
[attaway-login8:231551] [11] /opt/openmpi/4.0/intel/lib/openmpi/mca_pml_cm.so(+0x61be)[0x7fffe49a71be]
[attaway-login8:231551] [12] /opt/openmpi/4.0/intel/lib/libmpi.so.40(mca_pml_base_select+0x19f)[0x7ffff423beaf]
[attaway-login8:231551] [13] /opt/openmpi/4.0/intel/lib/libmpi.so.40(ompi_mpi_init+0x8b4)[0x7ffff4249dc4]
[attaway-login8:231551] [14] /opt/openmpi/4.0/intel/lib/libmpi.so.40(MPI_Init+0x16a)[0x7ffff41f401a]
[attaway-login8:231551] [15] openfastcpp(main+0xd2)[0x40f8f2]
[attaway-login8:231551] [16] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x7ffff338b555]
[attaway-login8:231551] [17] openfastcpp[0x40f769]
[attaway-login8:231551] *** End of error message ***
[psakiev@attaway-login8 wind_speed_8.0]$ module list

UPDATE: this is resolved when I run on a compute node.

pscrozi commented 9 months ago

We're all in a holding pattern right now, waiting for resolution of the mpi error above.

psakievich commented 9 months ago

when I run standard openfast I see this. Any comments @gantech?

[psakiev@attaway-login8 wind_speed_8.0]$ openfast IEA-15-240-RWT-Monopile.fst


 Copyright (C) 2023 National Renewable Energy Laboratory
 Copyright (C) 2023 Envision Energy USA LTD

 This program is licensed under Apache License Version 2.0 and comes with ABSOLUTELY NO WARRANTY.
 See the "LICENSE" file distributed with this software for details.

 Compile Info:
  - Compiler: Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R)
  64, Version 2021.3.0 Build 20210609_000000
  - Architecture: 64 bit
  - Precision: double
  - OpenMP: No
  - Date: Oct 03 2023
  - Time: 20:24:54
 Execution Info:
  - Date: 10/04/2023
  - Time: 08:06:16-0600

 OpenFAST input file heading:
     IEA 15 MW offshore reference model monopile configuration

 Running ElastoDyn.
 Nodal outputs section of ElastoDyn input file not found or improperly formatted.
 Running AeroDyn.
 AD15 Nodal Outputs: Nodal output section of AeroDyn input file not found or improperly formatted.
 Skipping nodal outputs.
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 1, blade 1)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 2, blade 1)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 4, blade 1)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 5, blade 1)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 1, blade 2)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 2, blade 2)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 4, blade 2)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 5, blade 2)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 1, blade 3)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 2, blade 3)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 4, blade 3)
 Warning: Turning off Unsteady Aerodynamics because C_nalpha is 0. (node 5, blade 3)
 Running ServoDyn.
 Running ServoDyn Interface for Bladed Controllers (using Intel Fortran for Linux, ).
 Using legacy Bladed DLL interface.

 FAST_InitializeAll:InitModuleMappings:ED_P_2_ExtLd_P_H: Both meshes must be committed before they
 can be mapped.

  OpenFAST encountered an error during module initialization.
  Simulation error level: FATAL ERROR

  Aborting OpenFAST.
pscrozi commented 9 months ago

Phil got it going on the Sandia machines yesterday (10/4). Has a job in the queue. Hasn't been assigned a start date. Need to follow up with HPC about priority. The hole cut looks good and should work. It is running very fast on 8 nodes (3-4 s per timestep). Should be able to get every single run done in a single submission. Would like to get one completed to make sure everything is behaving well. As soon as we're comfortable, we should be able to batch submit.

Nalu-Wind is running too fast. Is it even solving something. Max-iterations = 0. Phil will update for the submitted job.

pscrozi commented 9 months ago

One person will submit the jobs.

Still waiting to hear from HPC about priority. Phil will e-mail later today if needed.

pscrozi commented 9 months ago

We got priority at Sandia. Some of the jobs are now running, mostly by Phil and Neil.

Nate will follow up with Phil today (10/6) and see where we need to do.

pscrozi commented 9 months ago

A little stalled as of 10/9. One of Phil's job's got through, but crashed, right when Eagle came back up. Couldn't get IEA 15 MW to not diverge after 80 timesteps or so. We need to make sure the model is running well.