NREL / ROSCO

A Reference Open Source Controller for Wind Turbines
https://rosco.readthedocs.io/en/latest/
Apache License 2.0
116 stars 93 forks source link

Possible non-closing of a file #251

Open andrew-platt opened 1 year ago

andrew-platt commented 1 year ago

When using the ROSCO controller with OpenFAST in AMR-Wind, we ran into an issue where ROSCO ran out of unit numbers when checkpoint files were written at every timestep. This was noticed by @shenpai35

Details:

observations

Error:

ROSCO:PitchControl:PitchSaturation:interp1d:interp1d:interp1d:interp1d:interp1d:VariableSpeedControl:WindSpeedEstimator:AeroDynTorque:interp1d:interp1d:interp1d:GetNewUnit() was unable to find an open file unit specifier between 10 and 99.

conclusion Considering the above observations, I find it most likely that ROSCO is not closing some file.

  1. Maybe ROSCO isn't closing a checkpoint file (this was our first guess). I really don't understand how that could be given the code here: https://github.com/NREL/ROSCO/blob/v2.8.0/ROSCO/src/ROSCO_IO.f90#L31C1-L37. The only thing I can conclude is that somehow it returns before hitting the close on line 275. This might be possible if the OPEN command returns non-zero for ErrStat, but actually opens the file. However that would result in nothing getting written to the file, but it appears most files have something in them. This might be a red herring.
  2. Given the error above, there is another file getting opened every timestep, but not getting closed. Maybe something in Interp1D? I lean towards this possibility, but have not really looked into it (I'll take a quick look and see if anything is obvious and report if I find anything).

Also one minor thing: the ErrMsg from line 35 of ROSCO_IO.f90 never gets returned.

dzalkind commented 1 year ago

So, I borrowed this and added it here, but from what I understand, you have tried that, and it doesn't fix the issue?

dzalkind commented 1 year ago

Would you happen to have a non-AMRWind setup that I can test? The error messaging is certainly not what it should be, and it makes me wonder if there are other errors stopping ROSCO and preventing files from being closed. Also, it looks like the debug outputs are never closed. I would set LoggingLevel to 0 as a workaround.

andrew-platt commented 1 year ago

Increasing the maximum unit number only had the effect of postponing the unit number issue.

Unfortunately I don't have a non-AMRWind setup to test with. Perhaps it would be possible to run the IEA15 with OpenFAST and ROSCO and write out a checkpoint file every timestep (I'm not convinced this would actually show the problem though).

As you mention, maybe the debug outputs are the issue here (I don't know what LoggingLevel was used). Would a new debug file be opened after a checkpoint, or only once per simulation?

Coincidentally, this issue only appeared due to a bug whereby AMRWind requested checkpoints from OpenFAST at every timestep (we found a workaround for it). So it is somewhat unlikely that this issue would be a problem for most other use cases.

dzalkind commented 1 year ago

Hi @andrew-platt,

I finally got around to looking at this, and as you suspected, I haven't been able to reproduce the error using only OpenFAST and ROSCO. The unit number associated with the checkpoint file does not increase during the simulation.

Given the long error message, I wonder if ROSCO cannot kill the simulation properly which leads to files not being closed.

If there's a way to quickly run the AMRWind set up causing this issue, I could look into it more closely.

Best, Dan

abhineet-gupta commented 1 year ago

Hi @andrew-platt, I am looking deeper into this issue. Would it be possible to share the files replicating this. I am surprised that you were able to get OpenFAST 3.5.0 working with AMR-Wind (https://github.com/Exawind/amr-wind/issues/850). I have tried to build it and I see the same error as this.

Best, Abhineet

andrew-platt commented 1 year ago

Hi @abhineet-gupta,

Unfortunately I don't have the input files for this AMR-Wind simulation (this was done by an intern during the summer). One other issue we discovered when using AMR-Wind was that it was requesting checkpoint files at every timestep. By changing the input file for AMR-Wind to only request checkpoint files every 99999999 steps, we were able to prevent the checkpoints from getting written and not trigger this issue.

Andy

momemo1996 commented 1 year ago

Hi! I am running SOWFA coupled with OpenFAST (BeamDyn) and ROSCO as controller and I get the same issue. @andrew-platt where is the input set in your simulation and do you know if it is somewhere in .fst file for IEA 15MW? \Memo

andrew-platt commented 1 year ago

Hi @momemo1996,

The input for setting the checkpoint I was referring to is in the AMR-Wind input file (for AMR-Wind coupled to OpenFAST). For SOWFA coupled to OpenFAST, I'm not entirely certain where the checkpoint request is. Perhaps @mchurchf can give some guidance on how to set checkpoints with SOWFA?

It's also possible that what you are seeing is a different issue. Which version of SOWFA and OpenFAST are you using?

Regards,

momemo1996 commented 1 year ago

Hi @andrew-platt I am getting the exact same error as described in the thread. I found that it is in constant/turbineArrayProporties file. I will report here if your solution can solve the issue but still strange. Regards, Memo

dzalkind commented 11 months ago

This issue keeps popping up when using AMR Wind and writing many checkpoint files.

The current workaround is to reduce the number of checkpoint files that are being written.

I have a hunch that the checkpoint file is not being written or read completely here, and there is no error catching in this part of the code to stop ROSCO and throw a meaningful error.

We still don't have an easy way to reproduce this error, so if anyone can easily share that, it would be greatly appreciated.

momemo1996 commented 11 months ago

@andrew-platt I managed to solve this by increasing the number for which checkpoint files are written in the settings to a very high value so no checkpoint files are written! It is in turbineArrayProperties in constant folder. Thanks