CABLE-LSM / CABLE

Home to the CABLE land surface model and its documentation
https://cable.readthedocs.io/en/latest/
Other
8 stars 4 forks source link

Reformat BIOS ancillary information #281

Open ccarouge opened 5 months ago

ccarouge commented 5 months ago

In the BIOS setup, the ancillary information is read in 2 steps:

The binary file contains also additional variables that are not currently in the gridinfo file.

We need to decide how to deal with this additional information:

  1. reformat to netcdf, store in a separate file and keep the current reading procedure by overwriting the information in CABLE. This would result in having a generic gridinfo file and then additional files for localised information.
  2. or reformat to netcdf and overwrite the gridinfo file with that information. This would result in creating gridinfo files specific to each configuration as needed with different formats of gridinfo files for different configurations.

Solution 1

It keeps the gridinfo file format constant but then adds a precedence rule between files. It makes it less clear to users what is the final source of the input information. Adds cases that need to be handled by CABLE instead of putting them in the pre-processing.

Solution 2

It collates the information into one file which clarifies the source of the information to the users. Although, currently CABLE has rules of precedence in its input files. gridinfo has the least priority, then the met files and finally the restart file. It would require creating different gridinfo files for different configurations and the format of the gridinfo file would not be constant between configurations.

Problem: If we collate all the BIOS information in the gridinfo file, do we risk using different information coming from the restart file for example? Is this case improbable and only probable in case of user error? In the current implementation, does BIOS overwrite the information after reading in only the gridinfo file or after reading in all the other input files?

Implementation

We tend to favour solution 2 but we need to clarify if the identified problem makes it impossible to implement it this way.

Some start towards an implementation is here: /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/gridinfo_inprogress/convert_gridinfo_Aust.sh.

Input data for this script is here: /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/australia_0.05/params/nc.

ccarouge commented 3 months ago

@abhaasgoyal Do we want to use this issue to document the work for the gridinfo file? It probably needs some updating with our current plan.

ccarouge commented 3 months ago

This work will be performed using the act9test-serial-no-luc test case. We will work from the CABLE-POP_TRENDY branch directly. We need a first control run using the original binary inputs. As we convert inputs to netcdf format, running the same test case should give us identical results.

abhaasgoyal commented 3 months ago

Current implementation plan:

  1. Run /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/australia_0.05/params/nc/scripts/gdal_translate_flt2nc.sh to convert individual ancillary binary files to NetCDF. So I'll have to update the I/O routines to read NetCDF (some files are given here for example. Run tests as mentioned in the previous comment
  2. Combine all ancillary NetCDF files into one NetCDF and read from there
  3. I wasn't sure about this, but I think since we are inclining towards solution 2, we need to check the viability considering precedence in reading inputs.

@SeanBryan51 could you check whether there was anything to add there

SeanBryan51 commented 3 months ago

This looks good to me. In step 1 we can do a regression test to test the input files were converted correctly to NetCDF and in step 2 we can do a regression test on the combined input files (the gridinfo file). We will have to look more into step 3 and make sure values initialised from the gridinfo are not being overwritten by other sources (e.g. a restart file).

ccarouge commented 3 months ago

@abhaasgoyal I'm happy to look through the code with you about the precedence of inputs. We might be able to get the answer quickly. Doing it with you allows you to understand where it comes from.

SeanBryan51 commented 3 months ago

Requirements:

  1. BIOS gridinfo needs to be defined over Australian domain: longitude (degrees east) = 112.0, 154.0, latitude (degrees north) = -44.0, -10.0
  2. BIOS gridinfo needs to be defined at 0.05 degrees resolution.

Context

SeanBryan51 commented 3 months ago

I'm currently looking into the regridding of a ~1 degree~ 0.5 degree gridinfo (gridinfo_CSIRO_CRU05x05_4tiles.nc) to 0.05 degrees and have some questions:

  1. Does the exact location of each grid cell matter? For example, say we are working with the Australia domain, should grid points lie on multiples of 0.05 degrees (i.e. lat = [-10, -10.05, -10.1, ...], lon = [112, 112.05, 112.1, ...]) or be offset by 0.025 degrees (i.e. lat = [-10.025, -10.075, -10.125, ...], lon = [112.025, 112.075, 112.125, ...])? It looks like the 1 degree gridinfo is defined on lon = [-179.5, -178.5, -177.5, ...], lat = [89.5, 88.5, 87.5, ...], i.e. offset by half a degree.
  2. It seems most land quantities can use nearest neighbour interpolation but are there any exceptions? E.g. variables dependent on the area of the grid cell likely cannot be interpolated with nearest neighbour.
ccarouge commented 3 months ago

@SeanBryan51

Question 1: Place the centres to the offset point by half the resolution. This means we avoid having to deal with half gridcells in latitude at the poles, since the cell boundary lies on -90 and +90°. And that simplifies the regridding (see response to 2 below).

Question 2: Going with the answer of question 1, all the grid cells in the new grid at 0.05° will fit entirely within a grid cell at 1°. So we can use nearest neighbour for all interpolation. All the quantities are given per m2 when they depend on the area. Obviously, if a new grid cell straddled several grid cells that would be more difficult.

abhaasgoyal commented 3 months ago

Currently we have 4 functions to convert reading BIOS inputs in binary form to NetCDF:

  1. cable_bios_load_params - Sets soil parameters (soil) after calculations. NetCDF read in gridinfo file Assuming that we plan to read the gridinfo file gridinfo_AUST_005x005.nc, which is being generated with the following script:
    /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/gridinfo_inprogress/convert_gridinfo_Aust.sh

    We see that the script preprocesses the input doing the necessary calculations.

  2. cable_bios_load_fracC4 - Sets inVeg in LUC runs after calculations. NetCDF read in gridinfo, however it doesn’t seem to be processed based on the script.
  3. cable_bios_load_biome - Loads LUC_EXPT%biome. NetCDF read from LUC_Expt%ClimateFile. No additional calculations.
  4. cable_bios_load_climate_params - Set climate%modis_igbg and climate%AvgAnnMaxFAPAR after caclulations. NetCDF read in LUC_Expt%ClimateFile.

We plan to work with reading ancillary data purely from NetCDF files one function at a time considering precedence. Currently going for soil parameters in cable_bios_load_params.

To check precedence (in increasing order), currently, the order of reading parameters is:

  1. Input parameter values from namelist + grid info
  2. Restart file
  3. Met forcings file
  4. Ancillary data currently in binary form

Now, putting all ancillary variables into grid info is means that the metfile would have the highest precedence. We want to keep these values only when the parameters in met forcings are better. Hence, we decided on having an additional flag for reading met data or not.

Before further updating the issue, @har917 I had some questions:

Q1: There is a comment in convert_gridinfo_Aust.sh - #may need to add the IGBP major biome, fracC4, modis_igbp and AvAnnualMaxFAPAR layers, which applies to (2), (3), (4). Currently it is unclear whether these are required in main gridinfo file or as separate NetCDF file LUC_Expt%ClimateFile. Q2: Whether the versions in NetCDF files would be processed versions compared to binary ancillary data (hence no more calculations are needed within CABLE). Q3: If we were to implement the plan of action, the restart file values can always overwrite ancillary data, when in the original scenario ancillary data has the highest precedence. Is this behavior what we want? Paraphrasing @ccarouge - “Normally if people run a restart simulation, for it to really be a restart, it needs to ensure all the data is the same as in the initial simulation. Hence overwriting the data with the restart values. That said, all the data in the ancillary files we are working on here are constant for the run so it should be safe to assume that the restart and the ancillary data (step 4) have the same values. Hence the order of reading them should not be important.”

har917 commented 3 months ago

@abhaasgoyal @ccarouge

A couple of comments.

most importantly: I suspect that convertgridinfo_Aust.sh may have an error in it. Recent investigations have highlighted that BIOS uses a different gridinfo file gridinfo_CSIRO_CRU0.5x0.5_4tiles.nc sitting in the x45 g/data space (which is likely different to the gridinfo file referenced in that draft script).

On 1. BIOS requires some additional gridded inputs - it is a software engineering choice as to whether these should form part of a single BIOS_gridinfo file or as a separate netcdf input. My position is that it is better to combine into one for input management purposes (other applications of CABLE won't necessarily have to read in the additional layers so it shouldn't break them) - however this may come down to exactly where those inputs are read in by the code. Currently the separate files are read in at separate locations to the gridinfo - it may be better to retain that separation and retain separate input files.

On 2. I would hope that we can avoid having to have additional QA steps on the inputs inside CABLE. It creates confusion/lack of transparency - these additional QA steps are better placed as comments in the netcdf metadata in my mind*.

On 3. I don't have good visibility of what gets written into the restart file in terms of ancillary values. I wouldn't be surprised if none/not all of the BIOS gridded ancillaries make their way into the restart (NB there are gridded ancillaries and parameter values derived from gridded ancillaries to be aware of - especially in CASA). I think I would prefer if values in the restart were to take precedence over the ancillary data in the case of a difference (there are some use cases where it could be useful to go the other way round - think parameter estimation, predictability or sensitivity studies - but I think those can be handled easily outside as a preprocessing step on an existing restart) Perhaps the way forward - in the longer term - is to implement a check with a warning to log if a difference is found?

*Having said that we've gone and added some new ones recently due to path-of-least-resistance issues.

ccarouge commented 3 months ago

@har917 @abhaasgoyal

On 1:

Currently the separate files are read in at separate locations to the gridinfo - it may be better to retain that separation and retain separate input files.

What are you worried about? Or why do you need that capability? For the moment, we have identified the restart and the met file as potential alternate source for the data that are read after the gridinfo file but before the BIOS reading call. Ian, you agree that the restart should take precedence so changing the behaviour here would be ok. It leaves the met file. We are proposing to make reading the parameters from the met file optional.

I would like to try and read the gridinfo file in one location and see if we get the same values in the corresponding CABLE variables as when reading the BIOS files.

If we don't get the same values throughout the code then we know something else is overwriting these values and we can hunt down what is happening.

If we get the same values, then we don't need to complicate things.

On 2: great, we agree. All pre-processing in input file creation.

@SeanBryan51 Ian above is saying BIOS is using a gridinfo file at 0.05°, you talk of starting from a 1° gridinfo file. Need a check?

har917 commented 3 months ago

What are you worried about? Or why do you need that capability?

I don't think we need that capability per say. However, changing where/when the code is currently reading inputs risks mixing up the process of converting the inputs, and reading them in, and (in this case) porting them around the code base.

@SeanBryan51 For clarity the BIOS gridinfo file referenced is at 0.5 degree resolution - it maybe global (land north of >60S) - and has been set up to provide default initial conditions for a 4 tile version of CABLE (primary forest, secondary forest, grass, crop). The usual gridinfo file provides default initial conditions for 17 tiles (if memory serves).

SeanBryan51 commented 3 months ago

@abhaasgoyal here is the script I used to generate the 0.05 degree gridinfo: https://gist.github.com/SeanBryan51/f89a30ee1e22495965dccb8fdff0f8c4

The gridinfo file I used for the input is the same as what is used in the act9test-serial-no-luc namelist file, i.e. /g/data/rp23/data/no_provenance/gridinfo/gridinfo_CSIRO_CRU05x05_4tiles.nc.

ccarouge commented 3 months ago

@SeanBryan51 I had a look at the regridding script for gridinfo. It looks alright.

I've tried the script on the 0.5° gridinfo under /g/data/rp23/data. The grid files have the correct spacing between the centres and the corners are at the right places. The map file is harder to understand but at least the frac_* variables only have 0s and 1s in, that is what you want for a nearest neighbour interpolation.

I had a side-by-side look at the data from the original gridinfo and the new one. I selected the same region on the new one first and then plotted with ncview. All the variables seem to have the same values and same patterns. The patchfrac variable is extremely weird but that's true of the original data.

The files differ because the regridding adds:

SeanBryan51 commented 3 months ago

@ccarouge Thanks for trying out the regridding script.

I have tested CABLE (act9test-serial-no-luc) with the 0.05 degree gridinfo. Model output using the new gridinfo is bitwise identical with the output from the original configuration (provided we ignore the Area variable[^1]).

[^1]: It seems strange that there are differences in the Area variable since the underlying grid cell resolution for both runs should be at 0.05 degrees. Perhaps the Area variable in the model output does not accurately reflect the resolution of the model?

har917 commented 3 months ago

@ccarouge @SeanBryan51 - good to hear of progress.

The Area variable in CABLE output is known to be suspect - I certainly never use it and recalculate outside of the model. This is complicated by the need to handle single sites as well as gridded runs (including subsampling of grids - e.g. should a simulation that samples every second grid cell have the same Area as the full run or 4 times the Area?). Likely needs a separate issue.

ccarouge commented 2 months ago

@ccarouge Some of the reads are done into the LUC code, where do we want to put these layers? Identify what is read.

har917 commented 2 months ago

@ccarouge I think I need a bit more detail as to the question here. However, we have to be careful about interpreting routine names - this is because the PFT distribution and land use change code is interwoven (and somewhat oddly named) The reads (I think) you are talking about (C4frac, MVG) are done in the bios_cable_met_params and LUC_expt sections - this code sets up the model (specifically the PFT distribution and other parameters). It is necessary independently whether land use is called or not (though different bits get called with land use change on). MVG and C4frac are in effect ancillaries of equivalent type as the soil parameters.

Quite where the actual read should take place is a different question - I would be looking to see how TRENDY_POP does this (or similar) and follow the same pathway. This may require carrying some irrelevant layers (filled with zeros) around the code base for non-BIOS simulations - but that's probably better than having multiple reads of the same file.

ccarouge commented 2 months ago

Thanks Ian. Looking at the details here is what I have:

BIOS variable name BIOS reading routine CABLE variable POP-TRENDY variable name POP_TRENDY filename POP-TRENDY reading routine
MVG cable_bios_load_biome() LUC_EXPT%biome biome LUC_EXPT%ClimateFile READ_ClimateFile()
c4frac cable_bios_load_fracC4() no global variable. Local fracC4 variable used. Replaces LUC_EXPT%mtemp_min20 used in POP_TRENDY but not same unit/definition (more details at the end) mtemp_min20 LUC_EXPT%ClimateFile READ_ClimateFile()

So we would want these 2 BIOS layers to get into the ClimateFile but I'm guessing this file is created by CABLE in the "do climate" phase? Is that correct @har917 ? Or would we want a way to overwrite the data from the ClimateFile?

The handling of fracC4 (why the variable for the filename in the file is C4frac_file 😢 ) is also more problematic as it's not the same variable as what POP_TRENDY uses. I would think we could read fracC4 or mtemp_min20 depending if bios is on or off? The question is whether the ClimateFile should have both fields in or whether we should have 2 files: one for POP and one for BIOS?

more details on the differences between fracC4 and mtemp_min20

Both these variables are used to determine whether the grass at a point is a C3 grass or a C4 grass (they have different photosynthesis). BIOS is using fracC4 in this way: https://github.com/CABLE-LSM/CABLE/blob/7326db351e0f496dd484639c69d6dec59a0bfae3/offline/cable_LUC_EXPT.F90#L650-L654 while mtemp_min20 is used here: https://github.com/CABLE-LSM/CABLE/blob/7326db351e0f496dd484639c69d6dec59a0bfae3/offline/cable_LUC_EXPT.F90#L575-L579

SeanBryan51 commented 2 months ago

I'm looking into adding the BIOS ancillaries[^1] to the 0.05 degree gridinfo file and it looks like there is a slight mismatch in the lat lon grids between the gridinfo and the BIOS ancillaries:

  1. The grid in the BIOS ancillaries is not offset by 0.025 degrees (i.e. lat=[-43.55,-43.5,-43.45,...] and lon=[112.95,113,113.05,...]) where as the gridinfo is offset by 0.025 degrees (i.e. lat = [-10.025, -10.075, -10.125, ...], lon = [112.025, 112.075, 112.125, ...]).
  2. The lat lon grid of the BIOS ancillaries are defined on the domain: lat=[-43.55,-10.1] lon=[112.95,153.55] where as the gridinfo is defined on the domain lat=[-43.975,-10.025] lon=[112.025,153.925]. I.e. It looks like the BIOS ancillaries are missing data along the boundary of the Australia domain.

To combine the ancillary inputs with the gridinfo file, should we remap the BIOS ancillaries to the grid defined in the gridinfo file? In this case we require the BIOS ancillaries to be defined on the full Australia domain (i.e. latitude (degrees north) = -44.0, -10.0 and longitude (degrees east) = 112.0, 154.0). Alternatively, we could reduce the domain of the gridinfo so that all points in the BIOS ancillaries map to points in the gridinfo file and then do the remapping. What would be the best approach here?

[^1]: BIOS ancillaries are in NetCDF format and are found here: /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/australia_0.05/params/nc

SeanBryan51 commented 2 months ago

To combine the ancillary inputs with the gridinfo file, should we remap the BIOS ancillaries to the grid defined in the gridinfo file? In this case we require the BIOS ancillaries to be defined on the full Australia domain (i.e. latitude (degrees north) = -44.0, -10.0 and longitude (degrees east) = 112.0, 154.0).

I was overthinking it, the missing grid points in the BIOS ancillaries don't contain any land so we can set the value of each variable to its missing value over these grid points before remapping to the gridinfo grid.

har917 commented 2 months ago

@SeanBryan51 quick responses

  1. I suspect that we've picked up a difference between cell centres and cell corners here not a misalignment of the grid (and this may come from the original data sources). I suspect the easiest way to figure this out will be to compare coastlines (i.e. non-filled values) and then settle on one shifting of the data
  2. yes - the difference is likely to be at the boundaries and all (most) of these cells are going to be non-land.

We may also have to look into the binpackandsample routine to see if this does different things for different variables/input layers.

har917 commented 2 months ago

@ccarouge referring to ^^^^

You've picked up a fundamental difference in how POP_TRENDY and BIOS work - and it complicated because of the multi-stage aspects to POP runs. This likely requires a specific CASE() to be devised during this merge.

Since mtemp_min20 and POP_TRENDY biome are climate related these vars get put into the ClimateFile not in the gridinfo file.

I think my position on the way forward is (but I will have to check the sequencing) that

There is a bit of uncertainty in my head as to when these variables come from the ClimateFile and cable_bios_load_biome()/cable_bios_load_c4frac() vs when they come from one of the restart files (likely POPLUC). It could be that it's a case that i) read from ClimateFile or via MVG/C4frac then ii) if found in the ClimateFile overwrite.

ccarouge commented 2 months ago

Once we have a gridinfo file and the BIOS information on the same grid (without the half-grid cell shift), do we need to ensure all the data is using the same land mask?

This means a grid cell is land if both the original gridinfo and the BIOS variables have valid data for that point. That would mean the BIOS data would lose some islands and some precision in some caps. Do we want this to happen @har917 ?

@har917 Do you have some output from an Australian-wide BIOS run? Just to check what the coastline and islands look like. Or @AlisonBennett . This is just to see the output grid, we don't care about the variables and values in the file.

har917 commented 2 months ago

@ccarouge As far as I know we have never run BIOS at the full 0.05 degree resolution (we have run AWAP using most/all the same inputs). This is mostly because of resourcing/set up - in that I understand it's impossible to run BIOS on gadi with enough memory/run time.

From memory/discussions with Peter Briggs - the model is fairly ruthless in that it won't run if any of the inputs aren't valid (but I don;t know the detail as to how it is does this in practice). The main cause of concern wasn't islands/coasts but incland areas (lake Eyre) where the input soil ancillaries were/area incomplete.

We could try run to a run a do_climate section - that should read in the current ancillaries (and default initial conditions) to see what the effective resultant landmask looks like (but perhaps not immediately as there's lots going on and potential for confusion).

ccarouge commented 2 months ago

incland areas (lake Eyre) where the input soil ancillaries were/area incomplete @har917 I can see there are a lot of areas without data inland. What scenario would make sense for a BIOS configuration for these points:

  1. have them mask everywhere in all data and not produce simulation results for these points.
  2. use default data instead of BIOS-specific info for these points. The second case is potentially confusing as it would present the results as being from a BIOS simulation when, in fact, no BIOS-related information is used in the simulation for these points.
har917 commented 2 months ago

IF I remember correctly the current approach produces NaNs

I wouldn't use the default gridinfo - that will be inconsistent (especially if we get some soil parameters from BIOS and the other gridinfo). What maybe better to do is gap-fill using nearest neighbour from the BIOS data and somehow note which cells have been filled.

SeanBryan51 commented 2 months ago

Update regarding the BIOS gridinfo file:

To address the half grid cell misalignment between the BIOS and gridinfo grids, we decided not to shift the BIOS grid to the gridinfo grid because 1. this is prone to shifting geographical features which may cause issues when comparing to observations, and 2. shifting the BIOS grid will break bitwise equivalence in the CABLE model output as the model output is defined on the BIOS grid. Instead, we decided to shift the gridinfo grid to the BIOS grid (see latest revision of the regridding script). The resulting gridinfo file preserves bitwise reproducibility in the act9test-serial-no-luc configuration.

I am now in the process of adding the BIOS ancillaries to the gridinfo file as per Ian's script and cable_bios_met_obs_params.F90.

SeanBryan51 commented 2 months ago

Adding the BIOS derived bch variable to the gridinfo file (see here) breaks bitwise reproducibility in the act9test-serial-no-luc configuration. I'm currently investigating why this is happening.

From running a debugger, it looks like several variables are being initialised from soil parameters coming from the gridinfo file before soil parameters are overwritten by the BIOS derived parameters (see the derived_parameters function). It does not look like these variables are recalculated with BIOS overwritten soil parameters. @har917 is this the intended behaviour?

har917 commented 2 months ago

@SeanBryan51 @ccarouge @AlisonBennett

The derived parameters should be being assigned based on the underlying BIOS inputs - not the default gridinfo values (I think this means the order has been wrong in the current code).

Looking at the cable_bios_met_obs_params code again we assign BIOS values to %bch, %silt, %clay, %sand, %css, %hyds, %sfc, %ssat, %sucs , %swilt and %rhosoil

The key derived values will then be %hsbh, %pwb_min and %i2bp3 - these feed into soil moisture calculations unmodified - and the %_vec variables (which get used by the SLI module). (strictly) %cnsd would be affected but we're using `soilparmnew = .true.

%owetfac I think is overwritten before it is used but could also break bitwise comparability.

To assist in checking whether this is the cause of the bitwise comparability tests you could test (in the old code) by adding a call derived_parameters() at the end of the cable_bios_load_params subroutine (along with the other required USE statements necessary to get it to compile)

ccarouge commented 2 months ago

Open an issue about the bug in BIOS about calling derived_parameters after reading in the BIOS parameters.

ccarouge commented 2 months ago

We should try and quantify the impact of changing the ancillaries due to the derived_parameters bug.

SeanBryan51 commented 2 months ago

Update: I have managed to bitwise reproduce the output created with a gridinfo containing only the BIOS bch by updating the derived parameters which are dependent only on the BIOS bch variable and the original gridinfo parameters (see https://github.com/CABLE-LSM/CABLE/commit/c6fff2f60f63ef6837b85907c09fb6d84074e6d7). In summary: https://github.com/CABLE-LSM/CABLE/commit/c6fff2f60f63ef6837b85907c09fb6d84074e6d7 with original gridinfo == https://github.com/CABLE-LSM/CABLE/commit/7326db351e0f496dd484639c69d6dec59a0bfae3 with BIOS gridinfo (with bch only).

SeanBryan51 commented 1 month ago

Update: there was a bug in regrid_aus_05x05_to_005x005.sh where I was applying the wrong limits for the sucs variable. I have fixed this and I now have bitwise equivalence with https://github.com/CABLE-LSM/CABLE/commit/7326db351e0f496dd484639c69d6dec59a0bfae3 with BIOS gridinfo and https://github.com/CABLE-LSM/CABLE/commit/bfa87d36fa518139d9d80d7665ae48c7d0d23035 with original gridinfo. This verifies that calling derived_parameters after reading in the BIOS parameters has the same effect as using the BIOS gridinfo file.

ccarouge commented 1 month ago

gridinfo file is ready with the BIOS information in it. Created from the regrid_aus_05x05_to_005x005.sh script. Remove calls to read the BIOS information directly. Add metadata attributes.: ask Peter for information from the original process for provenance information.

Run Australia-wide at 0.25 resolution or a 1000 points simulation for 1 year ?? See how to do it.

SeanBryan51 commented 1 month ago

I did some quick testing of the BIOS gridinfo file for the 1000pts config with the BLAZE_9184 branch (commit https://github.com/CABLE-LSM/CABLE/commit/7dead5491f1f7a4822078a0d1db0ccad3dfb139a). CABLE errors due to changes in the number of patches and is incompatible with the existing restart files. My guess is this is likely due to the half grid-cell misalignment between the gridinfo grid and the BIOS grid resulting in ambiguous nearest neighbour interpolation of the gridinfo file within CABLE. It was probably pure luck that the shifted gridinfo file was bit reproducible for the act9 configuration.

I'm thinking the next steps around this would be to apply a fix to the nearest neighbour interpolation of the gridinfo file in CABLE such that we have bit reproducibility before and after for runs using the shifted gridinfo file.

Context: earlier on when I was looking into how best to shift the gridinfo file to the BIOS grid, I found running CABLE with a shifted gridinfo file (in arbitrary directions in latitude and longitude coordinates) broke the act9test-serial-no-luc configuration with the error Number of patches in restart file input/bios_cable_rst.nc does not equal to number in default/met file settings. (SUB load_parameters) Recommend running without restart file. This was because the BIOS grid contains points which are equidistant to multiple grid cells on the gridinfo grid which results in ambiguous nearest neighbor interpolation (nearest distance is dependent on the floating point precision of lat-lon coordinates) - this causes the total number of patches to change as grid points are initialised from slightly different grid cells on the gridinfo file. However, if I shifted the grid in such a way, I would reproduce the interpolation that was done internally within CABLE and not break the act9 configuration (as mentioned here).

har917 commented 1 month ago

It would be good to dig into this a little more deeply - is this because the half-grid move means that we get changes on the coast where we have meteorology and all soil parameters available? Or has something more fundamental happened that means were getting different number of patches for grid cells in the interior?

The first case should be relatively easy to handle since the input meteorology files are more extensive than actually provided into the binary files - if the latter then this is more problematic (and I can't think why it could happen)

The way to test would be to run without using the CABLE restart file - and let the model figure out the number of patches it wants to use and then compare.

ccarouge commented 1 month ago

@har917 Here is an example to explain what happens with the grids. The gridinfo at 0.5° has these grid cell centres for latitude (similar story with longitude so just showing one here): 87.75, 87.25, 86.75, 86.25, 85.75 In other words, the grid cell boundaries fall on "whole" latitudes and longitudes: 88.0,87.5, 87.0, 86.5, 86.0 etc.

The BIOS grid has the following grid centres: 87.0, 87.05, 87.1, 87.15, 87.2, 87.25 In other words, for every few grid cells, the centre of the BIOS grid cell falls right on the edge of a gridinfo grid cell. That means the centre of the BIOS grid cell is equidistant to 2 gridinfo cells (or 4 gridinfo grid cells when it falls on corner of the gridinfo grid cells).

Now the way this works in CABLE:

  1. CABLE reads in the BIOS met forcing and sets the grid using these. So the grid cells in the CABLE simulation are the same as the BIOS grid cells.
  2. CABLE reads in the gridinfo file at a different resolution. So CABLE will go and try to find the closest grid cell. But because of the shift in the cells centres, for a few of the cells, the centres are equidistant of several grid cells in gridinfo. There is no handling of this case in CABLE. We simply "use" the fact floating points can't be at equal distance in programming, the precision will always put one closer than the other.

We end up with a random choice of gridinfo grid cell for these BIOS grid cells. Some might take the point that is to the East or North-East or South or West etc.

When we regrid the gridinfo file onto the BIOS grid, we have to choose one direction to shift the map towards. It will be the same direction for the whole grid, not a random direction that is different for different points like in CABLE.

As a result, we end up with some points using the information from the gridinfo file from a neighbouring grid cell in the new setup compared to the old which can have a different number of tiles. And this can happen randomly throughout the whole map.

har917 commented 1 month ago

@ccarouge

which can have a different number of tiles.

For BIOS (i.e. with POP active) I don't think this is true - at least in the absence of land-use change. BIOS only ever has (up to) 3 tiles per grid cell and this decision (i.e. 2 or 3 tiles per grid cell) is independent of the soil ancillaries.

Coastal cells/soil ancillary gaps are a potential counter example - since if there are no valid soil ancillaries at the grid cell, BIOS will remove those tiles from the vector of land points even if there is valid meteorology available (or it will crash in a different way). Shifts in the gridinfo could then lead to a different number of tiles/land points.

However - if we are testing with a non-POP configuration then, I agree, shifts can change the number of tiles.

abhaasgoyal commented 3 weeks ago

Update: I ran act9test configuration script for 0.05° resolution gridinfo (gridinfo_CSIRO_CRU005x005_4tiles.nc), which should have the processed ancillary inputs since I used @SeanBryan51 's conversion script with --append-bios option. The tests were run in 2 versions of CABLE

  1. CABLE-POP_TRENDY as the base branch. Here, the input values read from the gridinfo are overwritten by the binary ancillary inputs loaded/processed in cable_bios_load_params
  2. BIOS_no_bios_input with the difference of cable_driver not calling cable_bios_load_params.

Normally, this should not create any difference in the results since the new gridinfo should have the same ancillary data after processing inputs from binary files in cable_bios_load_params. However bitwise comparison of the output files show difference in values for sucs (postive/negative sign):

ext_newblazempi_off/run/outputs/bios_out_cable_1700_1899.nc
DIFFER : VARIABLE : sucs : POSITION : [0,0] : VALUES : 0.5989 <> -0.5989
DIFFER : VARIABLE : sucs : POSITION : [0,1] : VALUES : 0.4092 <> -0.4092
DIFFER : VARIABLE : sucs : POSITION : [0,2] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [0,3] : VALUES : 0.4092 <> -0.4092
DIFFER : VARIABLE : sucs : POSITION : [0,4] : VALUES : 0.4092 <> -0.4092
DIFFER : VARIABLE : sucs : POSITION : [0,5] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [0,6] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [0,7] : VALUES : 0.4092 <> -0.4092
DIFFER : VARIABLE : sucs : POSITION : [0,8] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [1,2] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [1,5] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [1,8] : VALUES : 0.3795 <> -0.3795
ext_newblazempi_off/run/outputs/bios_out_cable_1900_2022.nc
DIFFER : VARIABLE : sucs : POSITION : [0,0] : VALUES : 0.5989 <> -0.5989
DIFFER : VARIABLE : sucs : POSITION : [0,1] : VALUES : 0.4092 <> -0.4092
DIFFER : VARIABLE : sucs : POSITION : [0,2] : VALUES : 0.3795 <> -0.3795
DIFFER : VARIABLE : sucs : POSITION : [0,3] : VALUES : 0.4092 <> -0.4092
...

We suspect that it may be a bug in BIOS due to write_default_params changing sucs sign , but cable_bios_load_params not doing something similar.

https://github.com/CABLE-LSM/CABLE/blob/e190b23491ed7aca47c28cad028ef8687a5fa5be/offline/cable_parameters.F90#L1343

Further investigation is needed on whether this is a bug in BIOS. Once this is fixed, we can have assurance on values contained in the new gridinfo file.


Addendum: I also ran the 1000pts configuration in a similar manner with the 2 branches of CABLE, however I am getting errors in a lot of variables (original as well as derived). Need to test further on reproducing the outputs, as well as finding the cause.

har917 commented 3 weeks ago

@abhaasgoyal I think this is a matter of definition - it appears that the gridinfo file has been set up to specify -sucs whereas the BIOS binary and the methods that specify these parameters via namelist give sucs itself. I didn't spot that change in sign in cable_parameters() when I was doing the early work to identify how to connect the BIOS binaries to the gridinfo that @SeanBryan51 based his work on.

All very confusing - especially when you bring the vectorised soil properties (i.e. the _vec variables) and the GW scheme.

I'm a bit surprised that the -ve sign didn't show up in other testing but that maybe because BIOS uses a set of science options where %sucs is always used within an ABS() function.

ccarouge commented 3 weeks ago

After checking, this is because the gridinfo file expects positive values for sucs which are then changed to negative in CABLE. But the BIOS input is already negative.

So, @abhaasgoyal is going to modify the creation script for gridinfo to swap the sign of the input before writing to gridinfo.

It seems like a lot of sign changes applied on top of each others but we can't remove that from CABLE as that would break things for all the other gridinfo files in use.

abhaasgoyal commented 2 weeks ago

Update: After updating the sucs variable, act9test configuration output is bitwise-reproducible in CABLE-POP_TRENDY vs BIOS_no_bios_input branch.

However, running the 1000pts configuration (see branches 1000pts-new-params and 1000pts-new-params-bios-inputs-tmp) again provides mismatch for binary inputs and derived variables in cable_bios_load_params. A pattern that I noticed was that the differences in most variables were in position [x, 0/1/2, 16/32/79/122]. Essentially, the differences were in z axis in position 16/32/79/122. An example is provided as follows:

DIFFER : VARIABLE : visAlbedo : POSITION : [0,0,16] : VALUES : 0.058107 <> 0.0578484
DIFFER : VARIABLE : visAlbedo : POSITION : [0,0,32] : VALUES : 0.0768954 <> 0.0811007
DIFFER : VARIABLE : visAlbedo : POSITION : [0,0,79] : VALUES : 0.0695602 <> 0.0667755
DIFFER : VARIABLE : visAlbedo : POSITION : [0,0,122] : VALUES : 0.0499809 <> 0.049981

I'll attempt to debug this on whether the ancillary inputs from binary files vs new gridinfo were processed to be the same, and where it differs in the pipeline. Meanwhile, @AlisonBennett if you could explain how the binary files for 1000pts parameters were generated (/g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/reccap1000pts/params/) from Australia wide config, that'd be great.

har917 commented 2 weeks ago

@abhaasgoyal Since these differences have shown up in an albedo field on the first time step and only in some of the 1000 points - I'm fairly confident that this has something to do with how the different approaches have established some of the parameters (i.e. there's been a slight effective grid shift when establishing the 0.05 degree gridinfo file). I would look at the soilalb in the first instance.

abhaasgoyal commented 2 weeks ago

@har917 We are trying to trace the error by comparing binary ancillary data in 1000pts configuration (/g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/reccap1000pts/params) to the new Australia-wide 0.05 degree gridinfo (gridinfo_csiro_cru005x005_4tiles.nc), and for that we needed to convert the 1000pts binaries to NetCDF. How would you convert the files in recap1000pts/params to NetCDF (because gdal_translate seems to need .flt files?

har917 commented 2 weeks ago

How would you convert the files in recap1000pts/params to NetCDF?

Basically we don't/wouldn't - if we (@AlisonBennett and @har917) were to need netCDF files we would go back to the source flt/hdr files that exist on CSIRO's HPC [the binaries are the end of a 1-way chain of processing]. If needed we can rsync the flt/hdr files across to rp23 (and hopefully somewhere is a record of exactly the gdal_translate call that was used)

AlisonBennett commented 2 weeks ago

To further @har917's comment, once the flt/hdr files are synced across to rp23 they can then be converted to netcdf using gdal_translate.


From: har917 @.> Sent: Monday, 16 September 2024 17:56 To: CABLE-LSM/CABLE @.> Cc: Bennett, Alison (Environment, Aspendale) @.>; Mention @.> Subject: Re: [CABLE-LSM/CABLE] Reformat BIOS ancillary information (Issue #281)

How would you convert the files in recap1000pts/params to NetCDF?

Basically we don't/wouldn't - if we @.***https://github.com/AlisonBennett and @har917https://github.com/har917) were to need netCDF files we would go back to the source flt/hdr files that exist on CSIRO's HPC [the binaries are the end of a 1-way chain of processing]. If needed we can rsync the flt/hdr files across to rp23 (and hopefully somewhere is a record of exactly the gdal_translate call that was used)

— Reply to this email directly, view it on GitHubhttps://github.com/CABLE-LSM/CABLE/issues/281#issuecomment-2352237887, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIL5VYX2P767F3CJV4DVX4DZW2FKFAVCNFSM6AAAAABHEZJNSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJSGIZTOOBYG4. You are receiving this because you were mentioned.Message ID: @.***>

AlisonBennett commented 2 weeks ago

Comment on how the .bin files for subdomains are created from the aust_0.05_pt .bin files.

We use a Fortran utility (created by Peter Briggs) called BinFileSubsetAndPack. A copy of this program is stored at: /g/data/rp23/experiments/2024-04-17_BIOS3_merge/BIOS3_forcing folder/utilities/BinFileSubsetAndPack

Within the sitelists subdirectory there is are csv files that contain the list of points for each of the domains.

abhaasgoyal commented 1 week ago

Update

@har917 @AlisonBennett

We found out that the latitudes/longitudes for the land points were being incorrectly set-up from the gridinfo at some points (mostly based on coastal areas). When we read the ancillary values from the gridinfo, there is an initial check for valid iveg and SoilTemp points before selecting ilat/ilon

https://github.com/CABLE-LSM/CABLE/blob/e190b23491ed7aca47c28cad028ef8687a5fa5be/offline/cable_parameters.F90#L960

We assume that binary ancillary locations were picked up without the additional check present above.

Now, when gridinfo parameters were downscaled to 0.05 degree resolution via nearest neighbour interpolation, the coastal edges still had a coarse structure, however the ancillary data already had values suited at finer points. This is validated by the difference between existing and non-existing points between maps of bch and iveg are shown below.

image (coloured points indicate existence of value in one parameter but not in the other.)

The mismatch of existing values in BIOS and non-BIOS parameters posed a problem in some points of 1000pts configuration.

Solution

Goal

Points in non-BIOS fields should exist iff they exist in BIOS fields (since their maps are already finetuned to 0.05 degree resolution including coastal regions).

These were the list of non-BIOS variables which were fixed.

points_to_be_changed = [
 ('Nfix', []),
 ('SoilOrder', []),
 ('albedo2', []),
 ('Pdust', []),
 ('Pwea', []), 
 ('isoil', []),
 ('cnsd', []),
 ('Ndep', []),
 # TODO: Potential bug in patchfrac
 ("patchfrac", [2]), # Only 0 and 1 have non-zero values
 ('iveg', [1]), # Only needed at depth 0
 ('SoilMoist', [12, 6]),
 ('SnowDepth', [12]),
 ('Albedo', [3]),
 ('SoilTemp', [12, 6]),
 ('LAI', [12])
 ]

Algorithm

Assumption: All BIOS parameters have valid points in the same longitude/latitude. For reference, I took valid values in bch.

Update non-BIOS mapping in 2 stages:

  1. For each point which exists in BIOS mapping but doesn't in non-BIOS mapping, fill with existing values in nearest (euclidean distance) neighbour.
  2. Remove values which exists in non-BIOS param but don't in BIOS mapping.

A sample transformation for SoilTemp is shown below image

I will test the updated version gridinfo in 1000pts configuration.

abhaasgoyal commented 1 week ago

@har917 @AlisonBennett @ccarouge One weird thing I found out was that patchfrac mostly had values between [0, 1], however there is a small portion where it seems to be -9999. Is this expected behaviour (these values are mostly removed when applying the algorithm above since they don't exist in land)?

image