CABLE-LSM / CABLE

Home to the CABLE land surface model and its documentation
https://cable.readthedocs.io/en/latest/
Other
12 stars 6 forks source link

Reformat BIOS met forcing input to netcdf #322

Open ccarouge opened 5 months ago

ccarouge commented 5 months ago

We want a version of the BIOS configuration that will use the met forcing in netcdf format instead of the current binary format.

This will be worked on from the CABLE-POP_TRENDY branch, using the configuration act9test-serial-no-luc.

ccarouge commented 5 months ago

Outdated (To start working on this, @AlisonBennett needs to copy the met forcing data for 1900-1902 in rp23.)

See update below.

SeanBryan51 commented 5 months ago

Reply to @har917

Sean,

Can I check what you need from us regarding BIOS meteorology files you would like access to? I got a bit confused as to where we landed yesterday.

At the moment I/we could easily

  • copy some of the flt/hdr files from our archive across to rp23 (this is an rsync activity) – which could then be converted to netcdf. Which years would you like?
  • copy our archive of 1985-2014 in netcdf format across to rp23

Both are on the same 0.05 degree grid.

Personally I suspect that (for the purposes of testing) you could use the 1985-2014 netcdf and change the cable_user%startyear and %endyear accordingly – instead of 1900-1902. It is slightly less hassle to copy the netcdf archive than the flt/hdr – but happy to go with what you need. The netcdfs are larger if that’s a concern.

cheers, Ian

I don't see any issues with copying the archive of 1985-2014 in netcdf format across to rp23. Even better that it is available in NetCDF format.

@Whyborn has volunteered to look into the BIOS forcing - do you have anything to add?

Whyborn commented 5 months ago

I think we can start with just the NetCDF archive of 1985-2014 and check that it's in a format compatible with the Met forcing routines associated with POP_TRENDY.

har917 commented 5 months ago

Post-processed AGCD (plus McVicar wind) data are now in the BIOS3_forcing/netcdf/ directory in the appropriate directory in rp23.

I would not be surprised if there needs to be some tweaks to the metadata, dimensions, variable names, time axis to make these consistent with CRUJRA/TRENDY. Note that BIOS also runs with a slightly different set of met forcing to TRENDY.

ccarouge commented 3 months ago

Need to change the code to read in this netcdf data based on previous work for CRU

ccarouge commented 2 months ago

Next day and previous day values, in BIOS, are taken at the current day instead of the correct next/previous day. For the moment, we are going to reproduce this weird behaviour (as it may be there for a reason). It might be this way in the code because we are using different values at different times in the day (9am and 3pm) compared to CRU.

AlisonBennett commented 1 month ago

@Whyborn has found differences between the AGCD Binary and netCDF data in some variables, not all points, acttest9 data.

Using binary data from the b2311 run.

Could either be differences in the data or in the landmask.

Whyborn commented 1 month ago

I have found some differences between the Met forcing in binary form compared to NetCDF form. The test I am using is the act9test. The netCDF files I'm using are located at /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/netcdf_1985-2014 and the binary files I'm using for this test are b2311 files i.e. /g/data/rp23/experiments/2024-04-17_BIOS3-merge/BIOS3_forcing/acttest9/met/*_b2311.nc. Some of the points are off by a few percent, while others are the same.

I'm going to inspect the points around the land mask, which may tell me whether the issue is due to ambiguous nearest neighbour interpolation (i.e. original dataset defined at corners of cells of new dataset) or perhaps due to different versions of the original data they were generated from.

Whyborn commented 1 month ago

It does not seem to be a nearest neighbour interpolation issue, but in this exercise I think I found the actual issue- the order in which the points appear the binary is not the same. The binary inputs are generated by running along the latitudes in descending order, while the NetCDF latitudes are in ascending order.

AlisonBennett commented 1 month ago

This could certainly solve the problem. Does the sum of the points for the two dataset agree per timestep?


From: Lachlan Whyborn @.> Sent: Tuesday, 15 October 2024 17:38 To: CABLE-LSM/CABLE @.> Cc: Bennett, Alison (Environment, Aspendale) @.>; Mention @.> Subject: Re: [CABLE-LSM/CABLE] Reformat BIOS met forcing input to netcdf (Issue #322)

It does not seem to be a nearest neighbour interpolation issue, but in this exercise I think I found the actual issue- the order in which the points appear the binary is not the same. The binary inputs are generated by running along the latitudes in descending order, while the NetCDF latitudes are in ascending order.

— Reply to this email directly, view it on GitHubhttps://github.com/CABLE-LSM/CABLE/issues/322#issuecomment-2413017955, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIL5VYUBQKKNYIAUGNLEIHTZ3SZ7TAVCNFSM6AAAAABJ3EW23GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJTGAYTOOJVGU. You are receiving this because you were mentioned.Message ID: @.***>

Whyborn commented 1 month ago

Haven't checked the sum, but I doubt it since the met/ancillary pairs will be incorrect. After re-generating the met data and land mask with latitude inverted, I can now match the met values at a given date.

However, I've also found that the reported recycling start date recycle_met_startdate in the BIOS code does not seem to match what the MetDate variable contains- with the default start year of 1981, the MetDate variable reports the year to be 1980 for the first run year, and this MetDate variable appears to be the one used to access the correct binary information. Comparisons with the NetCDF data show that MetDate does correctly reflect the data being retrieved from the binary.

Whyborn commented 1 month ago

There is also a "problem" (at least I think it's a bug, but perhaps intended) that the leap year calculation is based on the driver simulation year, rather than the recycled year used for the met data. For example, when setting the cable_user%year_start = 1860 and the BIOS recycle_met_start_date = 1991 (which means it actually starts in 1990?), the first year is treated as a leap year and runs for 366 days, so what the CABLE driver think is the last year of 1860 (and therefore a met year of 1990 due to recycling), the BIOS reads the met data from the first day of 1991.

har917 commented 1 month ago

@Whyborn The intent is/was that the combination of %year_start and recycle_met_start_date etc. leads to the identification of a year in the met file and then this gets used as is. The code should be identifying whether met used is a leap year or not (not whether %year_start would indicate whether it's a leap year.

This could be something that I and @AlisonBennett introduced in error. We did change the (hardwired) recycle_met_start_year (from 1951?) to 1980 (to align with the availability of the climate projection information) - and we did change a MOD to a MODULO (because we were getting -ve values for one of the variables which was then leading the model to try and pick up met from a year before the start of the input file).

Whyborn commented 1 month ago

Ah yep, I see this syear which is imported from another module and then written over. Is there a good reason for this syear to exist within the BIOS code? A similar thing exists in the CRU code, where the "reference year" for the recycling is hard set to 1501 for unknown reasons (i.e. MetYear = RecycleStartYear + MOD(CurYear - 1501, RecyclePeriod) . Is there a good reason for these references to be non-zero? I think it would be much clearer if the reference year was always 0, so we just have RecycleStarYear + MOD(CurYear, RecyclePeriod) everywhere.

Whyborn commented 1 month ago

There is the is the second question of leap years. Currently, the leap year checking is done in the main driver, by checking the current year. But if the driver's current year is a leap year, it doesn't mean the year used for the met forcing is also a leap year. I was thinking I think it would be wise to put some minor constraints on the choices of RecycleStartYear and RecyclePeriod to ensure that the leap years in the driver and the recycling period always match up, but that's actually quite restrictive for the 100 and 400 year rules.

har917 commented 1 month ago

My suspicion on the hard coded 1501 is that that was the start year of a version of the CRU meteorology - and linked to a quirk with the MOD function.

MOD(-ve number, period) gives a -ve number - and so if you're not careful you can end up with MetYear lying outside the start of the available meteorology. Including the 1501 year in here avoids that problem (if the start year of the meteorology is at or before 1501). This is why @AlisonBennett switched to MODULOsince we were paying around with meteorology with different start dates and wanted to avoid having to recompile all the time.

Whyborn commented 1 month ago

I do think using MODULO is a good idea, even if we switch to just using year 0 as the reference year. Perhaps there will come a time when people want to do some Paleo things, and this pre-empts that for no downside I can think of. I think the recycled met should be selected by MetYear = RecycledStartYear + MODULO(DriverYear, RecyclePeriod).

mcuntz commented 1 month ago

I have a related question (since a while but did not know where to ask it): In cable_driver for spinup or transient runs, MetYear is calculated as (line 585ff in main and 643ff in CABLE-POP_Trendy:

MetYear = site%spinstartyear + &
                      MOD(CurYear- &
                      (site%spinstartyear-(site%spinendyear-site%spinstartyear +1)*100), &
                      (site%spinendyear-site%spinstartyear +1))

Where is the 100 coming from and why?

Whyborn commented 1 month ago

If something in the core routine is confusing you as a primary user of the code, I think it's worth making a separate issue about it. Seems every type of met forcing has it's own way of determining the spin-up period which is not clear. I've made an issue discussing this problem on it's own (and related problem regarding leap years) here

har917 commented 1 month ago

@juergenknauer do you have any insight to the question from @mcuntz ?

More generally this whole topic comes about because we mix up code/processing required to set the configuration (i.e. the details of the experiment) with the code required to read the input. Some of this is likely because of the route to implementation (e.g. anything to do with the binary files).

juergenknauer commented 1 month ago

@mcuntz, @har917 I don't know why there is a multiplication with 100 in there. It does give the same results with and without that multiplier for the numbers I have tested.

That part of the code affected the site-level setup that Vanessa (and I) used. That didn't involve any binary files.

Whyborn commented 1 month ago

I have a related question- do the choices of cable_user%YearStart and cable_user%YearEnd have any real meaning during the spin-up stages? As far as I'm aware, the only time-dependent input is the met forcing (and dynamic LUC when active). It looks to me like the only effects YearStart and YearEnd have on the met recycling is changing the point within the recycling period the met starts.

Am I wrong in thinking that for a spin-up stage with a met recycling period of 20 years, there would be no difference in results between

cable_user%YearStart=1860
cable_user%YearEnd=1879

and

cable_user%YearStart=1980
cable_user%YearEnd=1999

bar the values on the time axis?

har917 commented 1 month ago

@Whyborn You are correct for these particular cases that there won't be any difference - however if you changed the length of meteorology available to use for the spin up and/or change the met recycling period then you will get different answers.

Scientifically you are supposed to assess the sensitivity of your spun-up model to these details but we rarely do so.

Whyborn commented 1 month ago

In that case, it makes a lot of sense to me to set cable_user%YearStart and cable_user%YearEnd to match the recycling period. It's much more intuitive, and we dodge the problem of mismatched leap years. Further, you no longer get the "unexpected" discontinuities in the weather. When you reach the end of a spin-up cycle, it is expected that there may be sharp changes in the forcing, but it's unless you have intimate knowledge of the internals of the code, you wouldn't expect this to happen in the middle of a spin-up cycle.

har917 commented 1 month ago

I've always thought that the way to do things is to specify a start_date, an end_date and a number of repeats - which is basically what you are saying. The difficulty is the time dimensions in the output - unless you run with each repeat as a separate job you end up with the time output jumping around and repeating which makes later analysis (necessary if only to check that the spin up has worked) difficult.

Whyborn commented 1 month ago

My idea is similar, except rather than specifying number_of_repeats, you specify some equilibrium criteria for the stage e.g. SoilTempDifference, which I believe is how things work now? So the only difference between my idea of the ideal situation and the current situation is that start_date and end_date actually match up with the spin-up period used. There's no reason why your outputs could not contain 2 series for each variable, for the most and second most recent cycles of spin-up, so you can inspect the "quality" of your equilibrium.

In any case, I think this is going a fair way beyond the scope of the original issue. I think the way I will test that the NetCDF BIOS is being read and processed in the same way as the binary is manufacture a short spin-up that avoids any leap years, and check that for binary equivalence.

Whyborn commented 1 month ago

Using a spin-up period from 1981 to 1983 (a period without leap years), I have confirmed that the outputs from the weather generator are the same i.e. the contents of the met derived type are the same for both branches. However, there are some differences, at least in the GPP and water/energy balances after the first year. I will do a deeper dive into the outputs using daily writing frequency to find the source of the differences.

Whyborn commented 4 weeks ago

I think I have found the source of the differences in outputs- there are a few select days in the short-wave radiation dataset that are different between the NetCDF and the binary inputs. I've included a plot of NetCDF - Binary for each of the variables from 1985 to 2000 for the first point in the ACT9 test case. All other variables are identical. The differences in the shortwave radiation are significant where they occur. Similar behaviour occurs across all the points in the test case, but it's not necessarily the same time indices which vary at each point. The first difference occurs during 1990, which is why I didn't pick it up in earlier inspections of the source data.

The specific datasets used for the shortwave radiation are:

The other radiation dataset which exists under acttest9 only contains a spin-up period from 1960-1991, so could not have been used as the source for the NetCDF data as it spans 1985-2014.

InputDifferences1985-2000

AlisonBennett commented 3 weeks ago

Are the differences where values in the netCDF are <0.1 or thereabouts? We did modify some of the rsds data when converting from the AGCD cogs archive for the most recent set of data (ie. file endings with b2405) . This modification set any values that were very close to or below 0 to 0 to avoid negative rsds (because it's an impossible value). The result would impact the binary files but not affect the source netCDF I think - but I will have to dig into it further to confirm.


From: Lachlan Whyborn @.> Sent: Friday, 1 November 2024 15:43 To: CABLE-LSM/CABLE @.> Cc: Bennett, Alison (Environment, Aspendale) @.>; Mention @.> Subject: Re: [CABLE-LSM/CABLE] Reformat BIOS met forcing input to netcdf (Issue #322)

I think I have found the source of the differences in outputs- there are a few select days in the short-wave radiation dataset that are different between the NetCDF and the binary inputs. I've included a plot of NetCDF - Binary for each of the variables from 1985 to 2000 for the first point in the ACT9 test case. All other variables are identical. The differences in the shortwave radiation are significant where they occur. Similar behaviour occurs across all the points in the test case, but it's not necessarily the same time indices which vary at each point. The specific datasets used for the shortwave radiation are:

The other radiation dataset which exists under acttest9 only contains a spin-up period from 1960-1991, so could not have been used as the source for the NetCDF data as it spans 1985-2014.

InputDifferences1985-2000.png (view on web)https://github.com/user-attachments/assets/2a8d24a7-cb4c-476f-83d1-18b273c20ef3

— Reply to this email directly, view it on GitHubhttps://github.com/CABLE-LSM/CABLE/issues/322#issuecomment-2451284200, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIL5VYQYP5UPSBDJ4MMIFTDZ6MBIJAVCNFSM6AAAAABJ3EW23GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJRGI4DIMRQGA. You are receiving this because you were mentioned.Message ID: @.***>

Whyborn commented 3 weeks ago

The biggest difference is about 1.5. It looks like this is the difference- there are actually instances of the netCDF radiation being negative. I think the cut-off used for the binary is actually 1.13(?), based off the plot I've added below (note x-axis is days since the start of 1985).

RadiationPt1

It looks to me like there are two difference methods used in the creation of both the binary and netCDF files. The period prior to 1990 are what looks like monthly averaged values applied to the whole month, and values after this time daily averaged? Do you know why this is so?

Whyborn commented 3 weeks ago

Also noticed another bug in the BIOS code, in the previous day max temperature (exists both in POP_TRENDY and NESP2pt9_BLAZE branches). The previous day's max temperature that is passed to the weather generator is set to the existing value of the day's max temperature before updated from the binary. This is strictly incorrect due to the first day of running, in which case the existing day's max temperature is 0.0 so the previous day's max temperature used in the weather generator is 0. This is also inconsistent with the rest of the previous/next day's values being set to the current day's values (due to potential weather generator issues?). See the code here.

AlisonBennett commented 3 weeks ago

I checked the code - you are right, the lower value we set for acceptable radiaton was 1.13 which is exactly what you are seeing in the plot. So now we can confirm the differences that you found.

For the difference in calculation method pre- and post-1990, my guess is this is something to do with a change in how the BOM provided the data. I'm guessing that pre-1990 it was monthly values whereas after that we got daily values. If you need me to confirm I can check with Peter, our post-retirement fellow who knows all about the history of these data.

For the bug you just found - that probably needs a new issue. I'm not sure how to handle this sort of bug - how do you initialise the value when you have no existing value to pass through?


From: Lachlan Whyborn @.> Sent: Monday, 4 November 2024 14:50 To: CABLE-LSM/CABLE @.> Cc: Bennett, Alison (Environment, Aspendale) @.>; Mention @.> Subject: Re: [CABLE-LSM/CABLE] Reformat BIOS met forcing input to netcdf (Issue #322)

The biggest difference is about 1.5. It looks like this is the difference- there are actually instances of the netCDF radiation being negative. I think the cut-off used for the binary is actually 1.13(?), based off the plot I've added below (note x-axis is days since the start of 1985).

RadiationPt1.png (view on web)https://github.com/user-attachments/assets/4dd29f00-dc3f-4c8e-b72b-30992dd3b45e

It looks to me like there are two difference methods used in the creation of both the binary and netCDF files. The period prior to 1990 are what looks like monthly averaged values applied to the whole month, and values after this time daily averaged? Do you know why this is so?

— Reply to this email directly, view it on GitHubhttps://github.com/CABLE-LSM/CABLE/issues/322#issuecomment-2453779046, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIL5VYWN6COTZTCVDRXXBFLZ63VKHAVCNFSM6AAAAABJ3EW23GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTG43TSMBUGY. You are receiving this because you were mentioned.Message ID: @.***>

Whyborn commented 3 weeks ago

I don't think it's that important precisely how the data comes about, as long as I can use to confirm that my implementation isn't having an unexpected effect on the results, which I have (with the caveat that I modified the existing BIOS code to use the current day's maximum temperature as the previous day's maximum, in accordance with other next/previous day values).

You just retrieve the data from the previous day via indexing, rather than relying on existing piece of data. As long as you have an indexing routine that retrieves data from a given day, it's trivial to retrieve the next/previous day of data. Just need to be careful when at the extremities of the time period of the data. This happened in the existing CRU code- when retrieving the previous day's temperature in 01/01/1901, which was the first year in the dataset, they just re-used the data from 01/01/1901.

In summary, I have achieved bitwise equivalence between the existing BIOS routines and the new input routines, with the caveats:

har917 commented 3 weeks ago

Hej (from Sweden)

A couple of comments - @AlisonBennett it's probably worth reaching out to Peter for some history on the processing of the shortwave radiation. There's a document somewhere that outlines all of Peter's investigations over time.

yes - there is definitely a lower limit of 1.13 MJ/day that is applied somewhere in the processing from source (geotiffs from the AGCD archive) and the binary files. There are also QA applied by the Bureau prior to us getting the data - and these have varied over time.

Prior to 1990 we don't have AGCD solar radiation information (as it's not provided by the Bureau). What we use is a monthly climatology based on the 1990-2010(?) period instead (another reason to eventually shift from AGCD to BARRA-2).

For the WG%MaxTempDayPrev issue - this is inevitable with any meteorological forcing data set. The way forward is to follow the CRU read, i.e. to use the first available day's max temp as maxtempdayprev if needed. (ps there maybe an equivalent issue with mintempdaynext) Coding wise it could be something like

Whyborn commented 3 weeks ago

This is all workable- I can just use the period prior to 1990 to do bitwise comparisons to ensure the mathematics is identical between branches, the science discussion around the validity of said inputs is a later discussion.

On the previous/next day issues, I don't see why the problem is inevitable. Of course there are difficulties if you're starting at the very day of the dataset/running up to the end of the dataset, but as long as you have an established practice for handling that situation (the current approach is to "sit" at the bounds of the data- so if you try to look earlier than the start of the dataset, jsut take the first day, and the last day past the end of your dataset). As long as you have a routine that correctly retrieves data for a given day, there's no reason for the first step to be a problem.

As bit of pseudo-code that reflects how the new routines work:

FUNCTION get_met_data_from_day(Dataset, Year, Day)
! A routine which uses the metadata from dataset to retrieve the record for a given year and day of year
...
END FUNCTION get_met_data_from_day

TempMaxDay = get_met_data_from_day(TempMaxDataset, CurrentYear, CurrentDay)
TempMaxDayPrev = get_met_data_from_day(TempMaxDataset, CurrentYear, CurrentDay-1)
TempMinDay = get_met_data_from_day(TempMinDataset, CurrentYear, CurrentDay)
TempMinDayNext = get_met_data_from_day(TempMinDataset, CurrentYear, CurrentDay+1)

As long as your get_met_data_from_day routine handles those edge cases I mentioned correctly (and of course handle Day-+1 correctly at start/end of the year), then the process is completely agnostic to which day of the run it is.

ccarouge commented 1 week ago

We have reproducibility between binary and netcdf. Except, lat/lon are different (floating point precision) because they are calculated with binary files while they are read in netcdf.

Changed the selection of prev/next day on the first/last day of simulation.

Leap years: when recycling meteorology, we should run with 365 days calendar no matter what the input met. is.