Open ezhilsabareesh8 opened 6 days ago
@ezhilsabareesh8 Thanks for identifying and documenting these issues. As you probably know, FMS has deprecated mpp_open
, and with it ASCII write I/O support, so this must be handled on the MOM6 side. But we still need to retain compatibility with the FMS1 API, so some extra work may be needed to get that working.
A unit == -1
test works with open(newunit=unit, ...)
, which can never return -1. I would like to check that it is compatible with mpp_open(unit, ...)
. But I think this is OK.
I'm not clear why filename
needs to be changed from allocatable to fixed length. L427 should implicitly allocate the string. Can you post an example and the error?
After fixing the unit < 0
error, I can produce a useful truncation file, but it will sometimes contain duplicate entries. Almost certainly due to multiple ranks writing to the same file, but I am not sure why different ranks are reporting the same truncations. Is that what you see? Or are the problems more severe?
If I use fileset=MULTIPLE
, then I get a runtime error before any files are created:
forrtl: severe (66): output statement overflows record, unit -5, file Internal Formatted Write
I'm not yet sure what this means.
So far, I think the [uv]_file < 0
tests definitely need to be replaced, using [uv]_file == -1
, assuming that FMS1 still works. I'm not yet able to replicate an error with filename
as an allocatable, but I can believe there is a problem.
I'm not yet sure how to address the parallel writes. APPEND_FILE should ensure that we don't lose any content, but it's obviously not producing consistent output. But maybe we can deal with this after the other two have been fixed.
Thanks @marshallward for the response and confirming the issue.
- I'm not clear why
filename
needs to be changed from allocatable to fixed length. L427 should implicitly allocate the string. Can you post an example and the error?
This could be happening because when the fileset
is set to MULTIPLE
, the filenames become quite large. I encountered the same error as you:
forrtl: severe (66): output statement overflows record, unit -5, file Internal Formatted Write
However, when I switched to a fixed-length filename (with trimming in the inquire and open calls), this error stopped occurring. This might suggest that the allocatable version was having trouble handling longer filenames, especially when each rank writes a separate file.
- After fixing the
unit < 0
error, I can produce a useful truncation file, but it will sometimes contain duplicate entries. Almost certainly due to multiple ranks writing to the same file, but I am not sure why different ranks are reporting the same truncations. Is that what you see? Or are the problems more severe?
I haven't tested after fixing the file unit and setting fileset
to SINGLE
. In my case, when the file unit wasn’t corrected, the truncations were simply not written, and an empty truncation file was generated. So, I haven’t yet seen the issue with duplicate entries, but I agree this could be due to multiple ranks trying to write to the same file.
I suspect the problem with APPEND
lies in the fact that the default Fortran's open
routine doesn’t accept "APPEND" as a valid option for the action
specifier. For example, in this line :
open(newunit=unit, file=trim(filename), action=trim(action_arg), &
position=trim(position_arg))
Passing APPEND
results in an invalid action, since Fortran's action
specifier only supports READ
, WRITE
, or READWRITE
. The old FMS mpp_io
supported APPEND
through the mpp_open
call, but the Fortran open function doesn’t.
To append to the file, we may need to use the ACCESS
specifier which supports values like APPEND
, DIRECT
, or SEQUENTIAL
. You can refer to the Fortran open function specifiers and details here. This should ensure that multiple writes append correctly without losing content.
IMO the best solution for the moment is to fix the [uv]_file < 0
checks. This would resolve the immediate problems and the truncation files would at least be usable. This still needs to be verified in the FMS1 API.
I believe that the filename
length error is an uninitialized value being passed to open()
, which is being misinterpreted as a massive string. We could resolve that in some way, but it would also never happen if the [uv]_file
issue were fixed, since its value is associated with a nonempty file name. I would prefer to avoid a fixed length if possible.
Also note that APPEND
is passed to the position
argument, not to action
. This also replicates the existing mpp_open()
behavior. Using APPEND
for access
is an Intel extension and would not be standards compliant. (I believe this is why it is shown in green in your link.) IMO this is probably working as intended.
The more challenging question is whether to produce a coherent single file, or to juggle multiple per-rank files. Truncations are currently written as they happen, which avoids any buffering. But it also causes the concurrency issues described above. But I also think it's not an urgent problem and can be sorted out later.
PR #739 addresses the [uv]_file < 0
error.
In the current version of MOM6 with FMS2, truncation outputs are not correctly produced when the
fileset
flag is set toSINGLE_FILE
. This is due to the following:Fileset Flag Behavior: The
mpp_open
routine is no longer used in FMS2 (reference). When writing to an ASCII file, the default FORTRANopen
routine is used. InMOM_PointAccel.F90
, truncation outputs are being overwritten when the fileset flag isSINGLE_FILE
, leading to an empty truncation output. The relevant code is here and here:Setting
fileset=MULTIPLE
resolves the issue, but it opens multiple files with processor-specific filenames (e.g., V_velocity_truncations.1072).Also the file handle check here and here needs to be updated to
if (CS%v_file == -1)
sinceopen(newunit=...)
always returns negative file handles.Filename Handling in
MOM_io_infra.F90
: The declaration of the filename variable inopen_ASCII_file
crashes due to memory handling here. Changing filename to a fixed length of 50 or higher characters prevents the crash. The modifications are:and updating the
inquire
andopen
statements to usetrim(filename)
here and here.These changes resolves issues with file handling across multiple processors.