Exawind / amr-wind

AMReX-based structured wind solver
https://exawind.github.io/amr-wind
Other
106 stars 84 forks source link

openfast_stop_time not updated on restart #1196

Open lawrenceccheung opened 1 month ago

lawrenceccheung commented 1 month ago

Bug description

This might also be an issue for the openfast repo, but the problem appears in when running amr-wind with openfast. When an openfast ALM case is restarted, and you want to extend the length of the openfast simulation by increasing openfast_stop_time, it does not actually change the stop time value in the openfast run.

Note that exceeding the openfast_stop_time value does not actually halt the amr-wind run, but it does prevent openfast from outputting any additional data in the output file or writing out any more checkpoint files.

What I think is happening is that the openfast_stop_time is written into the chkp files, so any new value of stop time needs to be read in after the chkp file is loaded.

Steps to reproduce

A reproducible scenario would look like this. The first run would use these values for the Actuator settings:

Actuator.labels                          = T00 T01 T02 T03 T04 T05 T06 T07 T08
Actuator.T00.type                        = TurbineFastLine     
Actuator.T00.openfast_input_file         = T00_OpenFAST3p4_IEA15MW/IEA-15-240-RWT-Monopile/IEA-15-240-RWT-Monopile.fst
Actuator.T00.base_position               = 1963.53247 4000.0 0.0
Actuator.T00.rotor_diameter              = 240.0               
Actuator.T00.hub_height                  = 150.0               
Actuator.T00.num_points_blade            = 50                  
Actuator.T00.num_points_tower            = 12                  
Actuator.T00.epsilon                     = 2.0 2.0 2.0         
Actuator.T00.epsilon_tower               = 2.0 2.0 2.0         
Actuator.T00.openfast_start_time         = 0.0                 
Actuator.T00.openfast_stop_time          = 1000.0  

This should be sufficient for any runtime up to 1,000 seconds (in our case, the initial run is only 900 seconds). When we want to restart the run, we add these parameters:

Actuator.T00.type                        = TurbineFastLine     
Actuator.T00.openfast_input_file         = T00_OpenFAST3p4_IEA15MW/IEA-15-240-RWT-Monopile/IEA-15-240-RWT-Monopile.fst
Actuator.T00.base_position               = 1963.53247 4000.0 0.0
Actuator.T00.rotor_diameter              = 240.0               
Actuator.T00.hub_height                  = 150.0               
Actuator.T00.num_points_blade            = 50                  
Actuator.T00.num_points_tower            = 12                  
Actuator.T00.epsilon                     = 2.0 2.0 2.0         
Actuator.T00.epsilon_tower               = 2.0 2.0 2.0         
Actuator.T00.openfast_start_time         = 900.0               
Actuator.T00.openfast_stop_time          = 2000.0              
Actuator.T00.openfast_restart_file       = ./T00_OpenFAST3p4_IEA15MW/IEA-15-240-RWT-Monopile/IEA-15-240-RWT-Monopile.180000
Actuator.T00.openfast_sim_mode           = restart             
Actuator.T00.sim_mode                    = restart   

Steps to reproduce the behavior:

  1. Compiler used
    • [ ] GCC
    • [ ] LLVM
    • [ ] oneapi (Intel)
    • [ ] nvcc (NVIDIA)
    • [ ] rocm (AMD)
    • [X] with MPI
    • [X] other: Clang
  2. Operating system
    • [X] Linux
    • [ ] OSX
    • [ ] Windows
    • [ ] other (do tell ;)):
  3. Hardware:
    • [ ] CPU
    • [X] GPU
  4. Machine details ():
    Frontier
  5. Input file attachments
  6. Error (paste or attach): This is not an error per se, but you can see when the limit of the openfast output file has been reached:
    $ awk '{print $1}' T00_OpenFAST3p4_IEA15MW/IEA-15-240-RWT-Monopile/IEA-15-240-RWT-Monopile.out |tail
    999.9600
    999.9650
    999.9700
    999.9750
    999.9800
    999.9850
    999.9900
    999.9950
    1000.0000
    1000.0000
  7. If this is a segfault, a stack trace from a debug build (paste or attach):
    <!-- stack trace -->

Expected behavior

I'm thinking the solution to this problem will likely involve some combination of calling FAST_OpFM_Init to create a new openfast turbine with the updated stop_time (so the storage arrays get set to the right length), loading in the new checkpoint data, copying it over to the larger arrays, and then proceeding with the run.

AMR-Wind information

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: v2.0.0-4-gc70c279e
  AMR-Wind Git SHA :: c70c279eb6901edc4466d6f96f10e522ca6b62f9
  AMReX version    :: 24.03-36-g748f8dfea597

  Exec. time       :: Thu Aug  8 21:46:47 2024
  Build time       :: May 20 2024 00:00:24
  C++ compiler     :: Clang 15.0.0

  MPI              :: ON    (Num. ranks = 1600)
  GPU              :: ON    (Backend: HIP)
  OpenMP           :: OFF

  Enabled third-party libraries: 
    NetCDF    4.7.4
    HYPRE     2.31.0
    OpenFAST  

Additional context

Note that this currently means that openfast_stop_time needs to be set to a very large value which encompasses any possible run duration that might be expected in the simulation. The trade-off, of course, is that the chkp files will get very large in order to store all of the necessary arrays.

lawrenceccheung commented 1 month ago

Tagging @ndevelder because we discussed how certain things are baked into the chkp files and not changeable afterwards.

marchdf commented 4 weeks ago

Interesting issue... The proposed fix sounds kinda gnarly. Is there a way to talk to openfast devs to see if there's an easy way to automate this copying of arrays and things? This feels like a limitation of openfast and not amr-wind. Though the fix you describe sounds like more of a workaround to get openfast to do what you want?