CWorthy-ocean / roms-tools

Tools for creating input files for UCLA-ROMS simulations
https://roms-tools.readthedocs.io
Apache License 2.0
10 stars 5 forks source link

output of save with nx,ny not compatible with roms #111

Closed dafyddstephenson closed 2 months ago

dafyddstephenson commented 2 months ago

After running roms_grd.save('roms_grd.nc',nx=3,ny=3) I get this output from ROMS:

 ### ERROR: checkdims :: wrong size of dimension 'eta_rho' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.3.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_rho' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.3.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_rho' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.5.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_rho' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.5.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_u' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.5.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_v' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.5.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_rho' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.7.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_rho' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.7.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_u' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.7.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_v' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.7.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_u' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.2.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_v' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.2.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_rho' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.1.nc': must be    9 instead.

 ### ERROR: checkdims :: wrong size of dimension 'xi_rho' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.1.nc': must be    8 instead.

 Spherical grid detected.

 ### ERROR: checkdims :: wrong size of dimension 'xi_u' =    9
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.6.nc': must be    8 instead.

 ### ERROR: checkdims :: wrong size of dimension 'eta_v' =    8
            in netCDF file 'INPUT/PYTHON/PARTED/roms_grd.6.nc': must be    9 instead.
dafyddstephenson commented 2 months ago

update: removing the call to checkdims leads to

   ERROR:: 
        5 
            ncread read var= h

   NetCDF: Start+count exceeds dimension bound                                     

ERROR STOP 

which suggests that the variable shapes are indeed incorrect. I'm not sure why, the equivalent files I produced with MATLAB and partit have the same dimension sizes.

NoraLoose commented 2 months ago

Is it possible that the labels .0.nc, .1.nc, ... are mixed up?

dafyddstephenson commented 2 months ago

Just ran a quick script to print the shape of h on each tile for the MATLAB/partit and python/roms-tools files. Seems like the arrays are transposed?

NODE 0:
MATLAB SHAPE: (9, 9) | PYTHON SHAPE: (9, 9)
NODE 1:
MATLAB SHAPE: (9, 8) | PYTHON SHAPE: (8, 9)
NODE 2:
MATLAB SHAPE: (9, 9) | PYTHON SHAPE: (9, 9)
NODE 3:
MATLAB SHAPE: (8, 9) | PYTHON SHAPE: (9, 8)
NODE 4:
MATLAB SHAPE: (8, 8) | PYTHON SHAPE: (8, 8)
NODE 5:
MATLAB SHAPE: (8, 9) | PYTHON SHAPE: (9, 8)
NODE 6:
MATLAB SHAPE: (9, 9) | PYTHON SHAPE: (9, 9)
NODE 7:
MATLAB SHAPE: (9, 8) | PYTHON SHAPE: (8, 9)
NODE 8:
MATLAB SHAPE: (9, 9) | PYTHON SHAPE: (9, 9)
dafyddstephenson commented 2 months ago

... however the dimensions themselves aren't transposed:

TILE 0:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TILE 1:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 8} | PYTHON SIZES: {'eta_rho': 8, 'xi_rho': 9}
TILE 2:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TILE 3:
MATLAB SIZES: {'eta_rho': 8, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 8}
TILE 4:
MATLAB SIZES: {'eta_rho': 8, 'xi_rho': 8} | PYTHON SIZES: {'eta_rho': 8, 'xi_rho': 8}
TILE 5:
MATLAB SIZES: {'eta_rho': 8, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 8}
TILE 6:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TILE 7:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 8} | PYTHON SIZES: {'eta_rho': 8, 'xi_rho': 9}
TILE 8:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TomNicholas commented 2 months ago

Seems like the arrays are transposed?

I didn't realise ROMS would care about transposition. We can change partition to transpose before saving.

... however the dimensions themselves aren't transposed

In my mental model dimensions are an unordered set.

EDIT: Wait your second printout there implies that the dimensions aren't just transposed, the names are assigned incorrectly?

dafyddstephenson commented 2 months ago

In my mental model dimensions are an unordered set.

Typically ROMS reads netCDF with the dimensions in a particular order. I was initially suprised to see that this wasn't the problem here, although it makes sense given that checkdims is one of the few routines in ROMS that explicitly checks dimensions rather than just reading data as they appear.

the names are assigned incorrectly?

Yes, that's the impression I get. I assumed the dimensions would have the correct size but appear in the wrong order (see above), but in fact it seems they're being defined incorrectly altogether.

NoraLoose commented 2 months ago

I doubt that xarray swaps dimension names when doing a slicing operation. As stated above, my suspicion that the ordering of the nodes is done inconsistently with what ROMS expects. Taking this example:

TILE 0:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TILE 1:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 8} | PYTHON SIZES: {'eta_rho': 8, 'xi_rho': 9}
TILE 2:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TILE 3:
MATLAB SIZES: {'eta_rho': 8, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 8}
TILE 4:
MATLAB SIZES: {'eta_rho': 8, 'xi_rho': 8} | PYTHON SIZES: {'eta_rho': 8, 'xi_rho': 8}
TILE 5:
MATLAB SIZES: {'eta_rho': 8, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 8}
TILE 6:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}
TILE 7:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 8} | PYTHON SIZES: {'eta_rho': 8, 'xi_rho': 9}
TILE 8:
MATLAB SIZES: {'eta_rho': 9, 'xi_rho': 9} | PYTHON SIZES: {'eta_rho': 9, 'xi_rho': 9}

Maybe the python partitioning tool swapped tiles 1 and 5, etc.

NoraLoose commented 2 months ago

https://github.com/CWorthy-ocean/roms-tools/blob/c7364000eb97a5efb07f2422639f4d295443b9c6/roms_tools/utils.py#L192-L194

Looks like we use row-wise (C-style) ordering. Maybe ROMS expects column-wise (Fortran-style) ordering?

NoraLoose commented 2 months ago

I bet @sdbachman knows the answer to the question in the previous post because his OTT code uses the same tiling convention (I think).

sdbachman commented 2 months ago

My OTT does doesn't do tiling at all, but I think it is sensible to expect that ROMS expects column-major ordering. Have you tried just making a column-major example and seeing if it works?

NoraLoose commented 2 months ago

I was hoping that we can get a definite answer to what ROMS expects first. The danger with just trying without knowing is that it may "work" (i.e., no errors) but ROMS will compute in a domain whose tiles sit in the wrong places. 🧩

I think we will have to look at the partit generated tiles to get that definite answer. Who is working on this? @TomNicholas?

dafyddstephenson commented 2 months ago

file_number = i + (j * nx)

I managed to get it working by flipping i and j here. The output looks fine, the mask is assembled correctly and the tracer patterns look reasonable

NoraLoose commented 2 months ago

@dafyddstephenson your solution works for nx = ny but to switch to column-major ordering for the general case, we will have to do

for i in range(nx):
    for j in range(ny):
        file_number = j + (i * ny)
NoraLoose commented 2 months ago

I will go ahead and issue a PR with that solution. But we need to test it for several nx and ny and verify that the mask is still assembled correctly.

dafyddstephenson commented 2 months ago

Ah yeah, good point. I'll admit I didn't think much about the broader implications once the first thing I tried worked lol, lesson learned. I'll try 8 CPUs with nx=4, ny=2

dafyddstephenson commented 2 months ago

OK I can get it to run with your second suggestion, but only if NP_XI in ROMS corresponds to ny in roms-tools and NP_ETA in ROMS corresponds to nx in roms-tools. I found this in the partition docstring, suggesting this is intentional (@TomNicholas ?):

    nx : int, optional                                                                                                                                                                                                                                                                                                                                                      
        The number of partitions along the `eta` (latitude) direction. Must be a positive integer. Default is 1.                                                                                                                                                                                                                                                            

    ny : int, optional                                                                                                                                                                                                                                                                                                                                                      
        The number of partitions along the `xi` (longitude) direction. Must be a positive integer. Default is 1.   

sorry if I've just had too long a week, but this seems the wrong way around to me? I associate x with long and y with lat. Perhaps to avoid any ambiguity we should use/accept the ROMS parameter names of np_xi and np_eta ?

NoraLoose commented 2 months ago

Your suggestion makes sense to me @dafyddstephenson! Are you submitting a PR, or do you want me to do it?

dafyddstephenson commented 2 months ago

I haven't actually forked it yet so it'd be quickest for now if you did I think

TomNicholas commented 2 months ago

sorry if I've just had too long a week, but this seems the wrong way around to me? I associate x with long and y with lat.

I probably just got confused and messed this up! My tests were really checking that we got the correct [10, 9, 9, ..., 9, 10] kind of sizes.