assimilating satellite solar-induced fluorescence (SIF) into CLM4.5

timhoar commented 4 years ago

Ed note: This issue was originally reported 26 Nov 2014 and is being manually ported over to GitHub.

Dear Dr. Hoar,

My name is Xi Yang. I am currently working with Dr. Jung-Eun Lee on assimilating satellite solar-induced fluorescence (SIF) into CLM4.5 to see if we can improve the estimation of terrestrial carbon and water fluxes. We have already written a module that is embedded in CLM to produce SIF (thus SIF is produced as a state variable like photosynthesis). We are wondering if we can get your help on the data assimilation. Our goal is:

1) Perform an OSSE. Simply run the model to harvest some 'truth', and see if we can improve the model estimation using DART.

2) Assimilate real observations of SIF from satellite (in netcdf format).

I am in the process of 1). I have went through the codes in DART/lanai/models/clm. However I still have one question: according to section 17 of the tutorial, I need to use create_obs_sequence and create_fixed_network_sequence to create obs_seq.in for the assimilation process. Do you have suggestions on how to directly take the SIF value produced by CLM ('truth') and put them into the obs_seq.in file? Thank you!

Have a nice thanksgiving!

Best, -Xi

timhoar commented 4 years ago

Tim Hoar added a comment - 26/Nov/14 8:45 AM Hello Dr. Yang;

The outline for running an OSSE (which we allso call a 'perfect model' experiment) is described in: http://www.image.ucar.edu/DAReS/DART/DART_Observations.php#obs_seq_osse You should really run this for the lorenz_63 model ... with the exception of advancing the model for CLM, it is EXACTLY the same process.

After you understand the process, there will be some modifications you need because DART does not know anything about SIF observations - yet.

In order to run create_obs_sequence, you need to add the SIF observation support to DART. This happens in two places - the core routines have to know what an SIF observation is, and then CLM needs to know how to create an SIF observation at the location required by the observation (the forward observation operator).

DART has a concept of observation 'types' and 'kinds' that is explained in http://www.image.ucar.edu/DAReS/DART/DART_Observations.php#adding_types (the very bottom section), which references the documentation for obs_def_mod.f90 - https://proxy.subversion.ucar.edu/DAReS/DART/trunk/obs_def/obs_def_mod.html Fundamentally, there may be multiple sources of SIF observations (instruments), but the model only has the one representation of SIF. So DART relates the specific type of observation (e.g. RADIOSONDE_TEMPERATURE) to a generic kind (TEMPERATURE). The relationship is defined in DEFAULT_obs_def_mod.F90 and DEFAULT_obs_kind_mod.F90 Simply doing this and putting the specific type in input.nml&obs_kind_nml:assimilate_these_types will allow you to run create_obs_sequence and create_fixed_network_sequence.

We STRONGLY suggest that you start with a single observation in a known location. Do not try to define the entire observation stream you intend to use for you research.

The program perfect_model_obs reads the (empty) observation file and applies the forward observation operator to the CLM state and populates the observation file with observation values. Here's where I need more information. I need to know if you have SIF as a variable in CLM that you can write to the restart (or history) file. If you do, then the forward observation operator is simply a call to model_interpolate() ... and things are very easy.

Be aware that the most commonly ignored part of assimilation with CLM is that there are a lot of quantities that are 'state'. Letting the assimilation impact a single variable is generally insufficient. For example, brightness temperature depends on soil temperature, soil moisture, snow cover, leaf area index ... etc. Changing only one of these via assimilation is probably not letting the observation impact the state as much as it could, and may actually do harm by destroying the model 'relationship' between these quantities.

I am here all day today ... if you would like to talk more about it, please feel free to give me a call.

Tim

timhoar commented 4 years ago

Tim Hoar added a comment - 26/Nov/14 3:28 PM Xi Yang 2:15 PM (59 minutes ago)

to me, Jung-Eun, Jeffrey, Andrew Hi Tim,

Thank you for the detailed and prompt response! It is very useful. I think the next step for me is to follow your suggestions and start with Lorenz 63. I have one more question. When you say "start with a single observation at a known location", do you mean something like the observation of day 200 of year 2004 in Harvard Forest? Do I need to run a single point CLM?

Regarding to your question, yes SIF is produced as a state variable like photosynthesis (FPSN) in the restart and history files. Seems it will be straightforward – use model_interpolate() to get the modeled output. I will keep you updated. Thanks!

Thank you again and happy thanksgiving!

timhoar commented 4 years ago

Tim Hoar added a comment - 26/Nov/14 3:29 PM You're welcome.

You can run single point CLM or the 'full-blown' CLM - does not matter.

By single observation, I mean only one SIF observation location - For comparison, some satellites return 'observations' every 25km over their swath and result in many thousands of observation locations.

Just use 1.

You can put it in the Harvard forest (turns out that is normally where I put mine!) as long as your CLM domain covers the Harvard forest.

Good luck –

Tim

P.S. I can already tell it won't take you very long at all to run the lorenz examples. I don't know how much Ensemble DA theory you know, but if you like, you can work through the DART tutorial for a grasp of how it works in DART. Might be useful.

Tim Hoar thoar@ucar.edu 3:28 PM (0 minutes ago)

to Xi, Jung-Eun, Jeffrey, Andrew One more thing ... I am in the final stages of adding support for CLM4.5 to the DART/trunk (not the lanai release) The trunk version also has a much clearer way of specifying what variable comes from what file (some CLM variables in the restart file, some variables exist in the history file ... etc.). So you could explicitly state to get the SIF from a history file but update ?leaf carbon? in the restart file. Stuff like that. I will keep you posted as to the progress.

timhoar commented 4 years ago

Xi Yang via gmail.com Jan 12th

to me Hi Tim,

It is great to meet you in AGU. I have been reading DART tutorials and working on making OSSE work. I ran into a problem and wondering if you know where the issue is.

My apologies if the question seems naive. I used Matlab and IDL a lot but have little experience in fortran, some experience with shell scripts.

In short, I got the error that "DART should not be trying to advance CLM" when running perfect_model_obs.

Here is a longer description:

I basically followed the steps for Lorenz_63, and also the codes in lanai/models/clm/shell_scripts, particularly perfect_model.csh.

First I created the TYPE and KIND for solar-induced fluorescence (FSIF, KIND_FSIF).

Then I put FSIF into model_mod.f90, to have model_interpolate to calculate FSIF (using get_grid_vertval). After this I ran quickbuild.csh.

I then run ./create_obs_sequence; ./create_fixed_network_seq, and clm_to_dart (to use my clm restart file which is the output of a multi-instance run of CLM w/ SIF as output in history file).

Then I ran perfect_model_obs, and got the error message above.

I attached the output for your reference. I am not sure if you can also see my folders on yellowstone?

Looking forward to your reply!

timhoar commented 4 years ago

CESM is such a complicated beast that DART does not try to advance the model. This is different than with almost every other model.

CESM has an ensemble manager and a restart mechanism. So - what we do is insert a few lines into the CESM run script to interrupt the 'normal' sequence of events and run DART (either perfect_model_obs or filter) 'end-to-end' for a single assimilation cycle.

We carefully chop up the observation sequence files into many small files where each file contains just the observations for a single assimilation timestep. The filename is constructed with the assimilation date/time that matches the timestamps from CESM.

Essentially, CESM runs & writes out restarts / DART reads those restarts, modifiles them AND EXITS / and the CESM postrun commands are then resumed.

We have instructions and examples in the models/clm/shell_scripts directory.

Setting up CESM is no trivial task. My set of notes ultimately turned into a script: CESM1_1_1setup[pmo,hybrid] Depending on if you are using the trunk version of DART or the lanai release, you may also have CESM1_2_1setup[pmo,hybrid] scripts.

In short, what you did is exactly correct for almost every model. The CESM models are too complicated to be controlled by DART, so we use the normal CESM control mechanisms and run DART as a single step.

What I did not ask is ... Are you familiar with CESM in general - outside the DART context? Are you trying to learn CESM and DART at the same time?

timhoar commented 4 years ago

Tim Hoar added a comment - 15/Jan/15 1:09 PM Xi Yang via gmail.com 3:10 PM (21 hours ago)

to me Hi Tim,

Thank you for these comments! They are indeed really useful!

I figured out the problem. What I originally did was also the "manual" version of CESM___hybrid. I first run a multi-instance CESM (no DART involved), and used the restart and history files of one instance as the restart and history files (in the input.nml). Then I followed perfect_model.csh: I first manually created obs_seq file and then run clm_to_dart and perfect_model_obs.

The reason for the problem is that the assimilation should be limited to 1 day – instead in the namelist I put a few more days. In this way, DART has to advance the model but it is forbid in the model_mod.f90.

Your email perfectly explained the issue I had. Now I understand I need to create separate obs_seq files for each assimilation cycle – like what you did for the eddy flux data.

Could you please check the steps below to see if I understand it correctly?

Say if I want to do an OSSE for the time-period between 2004-01-02 and 2004-01-10 for Harvard Forest, and the assimilation cycle is one day. We've modified some code of CLM so it produces solar-induced fluorescence (SIF) in the history file.

1) First I do a multi-instance run of CESM for a year (2003-01-01 to 2004-01-01) 2) Configure CESM1_2_1_setup_pmo to run CESM for one day (2004-01-02), with the refcase from 1). 3) use ./create_obs_sequence and ./create_fixed_network_seq to create a "notebook" (obs_seq file) for nine day. Each day in a separate file. For example, obs_seq.20040102 4) In perfect_model.csh, modify codes to link obs_seq.in to obs_seq.20040102 5) Configure CESM_DART_CONFIG 6) To do this iteratively for the rest 8 days, set CONTINUE_RUN to true?

Is there a way to batch create obs_seq files?

Thank you so much for the help!

timhoar commented 4 years ago

Xi Yang via gmail.com Jan 14 (6 days ago)

to me Hi Tim,

Thank you for these comments! They are indeed really useful!

I figured out the problem. What I originally did was also the "manual" version of CESM___hybrid. I first run a multi-instance CESM (no DART involved), and used the restart and history files of one instance as the restart and history files (in the input.nml). Then I followed perfect_model.csh: I first manually created obs_seq file and then run clm_to_dart and perfect_model_obs.

The reason for the problem is that the assimilation should be limited to 1 day – instead in the namelist I put a few more days. In this way, DART has to advance the model but it is forbid in the model_mod.f90.

Your email perfectly explained the issue I had. Now I understand I need to create separate obs_seq files for each assimilation cycle – like what you did for the eddy flux data.

Could you please check the steps below to see if I understand it correctly?

Say if I want to do an OSSE for the time-period between 2004-01-02 and 2004-01-10 for Harvard Forest, and the assimilation cycle is one day. We've modified some code of CLM so it produces solar-induced fluorescence (SIF) in the history file.

1) First I do a multi-instance run of CESM for a year (2003-01-01 to 2004-01-01) 2) Configure CESM1_2_1_setup_pmo to run CESM for one day (2004-01-02), with the refcase from 1). 3) use ./create_obs_sequence and ./create_fixed_network_seq to create a "notebook" (obs_seq file) for nine day. Each day in a separate file. For example, obs_seq.20040102 4) In perfect_model.csh, modify codes to link obs_seq.in to obs_seq.20040102 5) Configure CESM_DART_CONFIG 6) To do this iteratively for the rest 8 days, set CONTINUE_RUN to true?

Is there a way to batch create obs_seq files?

Thank you so much for the help!

timhoar commented 4 years ago

to Xi

clm/shell_scripts/makedaily.sh is a script that makes temporal 'daily' subsets from a list of observation sequence files. There are two versions of this script - one that splits everything +/- 12 hours from the time that is used for the filename – and one that has a 'day' that matches the contents of the CLM history file output.

Depending on what branch/version you have, the script you have may be perfectly appropriate. The script is intended to be run in the shell_scripts directory, there is a namelist template in ../work that gets iteratively copied and modified for each new day.

Become familiar with the script and let me know if the version you have will work for you. If it does not, please give me a call and we can talk about where to go from here. I don't want to get into a long email if it is not warranted.

Good luck – Tim

timhoar commented 4 years ago

Hope you are enjoying a great summer! My advisor Jung-Eun has approved my visit. I am wondering if I could confirm with you the time frame of my visit to NCAR using the RSVP program. The application asks for the timeframe of the visit. Since I am pretty flexible in terms of timing, I'd be happy to come in a time when you are not busy.

If the RSVP application does not work, I will try applying other programs.

I made some progress on single point data assimilation @ HF. I am currently using the lanai version of DART. I am working on the OSSE but I found that after the assimilation, there is no difference between analysis and forecast. I checked the DART website on this topic (http://www.image.ucar.edu/DAReS/DART/DART_Documentation.php#DidItWork), and found that the reason of no impact could be that there is no spread in ensemble members. However, after I followed the instruction (set start_from_restart = .false., and single_restart_file_in = .true.), I got the error like:

'routine: open_restart_read 0: message: Problem opening file filter_ics

0: message: ... OPEN status was 2'

The case is under "/glade/p/work/xiyang/cases/HF_CLM45_20ensemble", and an example of error message can be found in "cesm.stdout.972091"

If phone call is more convenient, I'd be happy to make a phone call at your convenience. Thank you!

Above is the short version of the issue, and below is the detailed version.

I was assimilating solar-induced fluorescence (FSIF) and photosynthesis (FPSN), both of which are outputs in the history files. Plant physiology says that FSIF and FPSN are linked with a couple of variables in the restart files (T_VEG, vegetation temperature; fabi and fabd, fraction of absorbed PAR; and elai, leaf area index). So I put those variables in model_nml, clm_state_variables in the hope that any changes in FSIF will impact those variables.

I then did a perfect model simulation (case name: HF_CLM45_perfectmodel_dartlanai) and harvested the truth. Then I did a 20 ensemble run and get the prior, posterior etc. But the final examination using plot_evolution.m on the output file shows no change. Please see attached figure.

So I looked up in the DART website on this topic (http://www.image.ucar.edu/DAReS/DART/DART_Documentation.php#DidItWork), and found that the reason of no impact could be that there is no spread in ensemble members. I did the procedures mentioned above but encountered this issue.

All the best, -Xi

timhoar commented 4 years ago

On Friday the 16th, Xi and I had a Skype session. He was still not getting any difference in the priors and posteriors, and very little, if any, change in some of his state variables.

Several things: 1) He is running a version of CLM that uses the CLM history file for the forward obs operator - so there cannot be a difference between the prior and posterior observation values. 2) Some of the state variables were different, but not all of them. I am trying to check to see if the stream files all have the same values for some (incoming solar radiation, for example) ...

His case is in /glade/p/work/xiyang/cases/Ha1_CLM45_40ensemble with the expected /glade/scratch/xiyang ...

I quickly hacked the bits he used to support his SIF obs into my 'trunk' version and made him some executables to try.

to me Hi Tim,

I ran a few new tests with the trunk version copied from your directory. There is a glitch when I started the perfectmodel (/glade/scratch/xiyang/Ha1_CLM45_trunk_perfectmodel/run; /glade/p/work/xiyang/cases/Ha1_CLM45_trunk_perfectmodel). I got the following error when I ran the perfect model case:

"ERROR FROM: routine: static_init_model

message: (2d) unknown first Dimension name lndgrid while trying to create metadata for FSIF"

This error can also be found in: /glade/p/work/xiyang/cases/Ha1_CLM45_trunk_perfectmodel/cesm.stdout.477718

Looking at the code model_mod.f90. This error should come from line 986. It looks like that there is no CASE "lndgrid" thus the code didn't know what to do. FSIF for this single case, has a dimension of (time, lndgird).

Should I modify model_mod.f90 to include the case with lndgrid?

I can also do skype if it is more convenient for you.

Thank you! -Xi

timhoar commented 4 years ago

Tim Hoar thoar@ucar.edu Oct 26 (4 days ago)

to Xi Xi,

I have been digging around your Ha1_CLM45_40ensemble results from 2010-07-xx and have a few things I don't understand - still.

it seems all the ensemble member have identical values for tlai - so it makes perfect sense to me that the assimilation cannot 'move' them. The algorithm has no impact when the prior ensemble spread is zero.

elai is the same, of the 18 PFTs, only 4 have non-zero values (2,8,14,16), but all the ensemble members have the same value (the min/max differ by 1E-15, which is the limit of precision for 64 bit reals)

Then I looked at T_VEG and the ensemble members have different values - so the assimilation can work - and there IS a difference between the prior and posterior. What I find weird is that tlai and elai only have 4 PFTs with non-zero values, but T_VEG has 5 PFTs with non-zero values!?

How is that possible? perhaps you can describe to me how the initial ensemble was created?

We should set up another skype session for this week to discuss this.

I also tried to see if the 'new' results made sense - the ones you just created today: Ha1_CLM45BGC_trunk_40ensemble_assimilate

and they look much more consistent - although I am still confused by why some variables have different PFTs than others

timhoar commented 4 years ago

Xi Yang AttachmentsOct 26 (4 days ago)

to me Hi Tim,

Thanks for digging into the files. Yes I am available Tuesday 11:00 am - 2:00 pm; 2:30pm - 4:00pm; Wed 1pm-4pm; Thurs. 10:30pm - 4:00pm. What time frame works for you?

Below is the detailed response to your email. We can also talk about it during the skype meeting.

Oh I found to make the version of DART works there are two minor revisions needed: 1) I need to also copy model_mod_check.f90 for your directory; 2) in shell_scripts/assimilate.csh, at line 138 that the observation file should be clm_obs_seq.${LND_DATE_EXT}.perfect

For the old Ha1_CLM45_40ensemble. I used I1PTCLM45 compset. The way I did it, as I realized today, might be incorrect. So basically I did a single case spinup from 2004 to 2010 based on the interpinic result from "/glade/p/work/xiyang/inputdata/ccsm4_init/I2000CN_f09_g16_c100503". Then from there I did a perfect model run based on the CAM output 0023 (randomly chosen as the truth).

Using the same init (interpinic result), I spun up the 40 ensembles from 2004 to 2010, and did assimilation from 2010-07-01.

OK this is my old way of doing it. Today I changed a bit for the new tests. I realized I need to use CLM45 BGC (ICRUCLM45BGC is the compset). First I used the spun-up files came with CLM45 release (/glade/p/cesmdata/cseg/inputdata/lnd/clm2/initdata/clmi.BCN.2000-01-01_0.9x1.25_gx1v6_simyr2000_c100303.nc). Use interpinic to create an initial condition for Harvard Forest.

Then I did a short spinup for 10 years using 40 ensembles with CAM forcing from Andy.

After the spinup run, I did a perfect model run using instance 0023 from 2010-07-01 to 2010-07-30.

I then did an assimilation for the same period.

The results from obs_diag_output.nc (/glade/u/home/xiyang/DART/trunk/models/clm/work/obs_diag_output.nc) shows no change in rmse. The innovation files however showed some changes. The changes in fabd, and fabi are too small, in part because the daily variability of these values are quite small. Attached is a figure showing fabd on y axis, FPSN (photosynthesis) on x-axis. Almost NO correlation between these two. In theory:

FPSN = fabd (or fabi for shaded leaves) PAR light use efficiency FSIF = fabd PAR SIFefficiency

The variability of fabd, light use efficiency, and SIFefficiency are several orders of magnitude smaller than the variability of PAR. Consider that fabd only varies 0.01 while at the same time PAR varies from 50 - 1000. So I need to find something that relates directly to PAR. I found variables like:

fabd_sun_z(pft, levcan)

fabd_sun_z:long_name = "absorbed sunlit leaf direct PAR (per unit lai+sai) for canopy layer" ;

fabd_sha_z(pft, levcan);fabi_sun_z(pft, levcan);fabi_sha_z(pft, levcan)

I will think about ways to include these variable in DART.

timhoar commented 4 years ago

Xi Yang AttachmentsOct 28 (2 days ago)

to me, Jung-Eun Hi Tim,

I think it works! As I mentioned in the last email, Jung-Eun suggested that FPSN and FSIF should be directly controlled by fabd_sun_z(pft, levcan), fabd_sha_z(pft,levcan), fabi_sun_z(pft, levcan), fabi_sha_z(pft,levcan).

So today I modified model_mod.f90 to include levcan, in the same way that you add numrad. And please see attached two figures. The assimilation did impact both FSIF and FPSN.

The cases are "Ha1_CLM45BGC_trunk_40ensemble_assimilate_test2" and "Ha1_CLM45BGC_trunk_perfectmodel_test2".

If it looks correct with you, I will do a longer assimilation.

Thank you!

timhoar commented 4 years ago

Hope that you had a restful break and your new year is off a great start!

I am wondering if you can help me with the question about ensemble model spread in the CLM/DART runs. I took a look at the slides you made for the NCAR data assimilation workshop. The ensemble spread (for example, on page 39) is comfortably large so that the assimilation results are obvious.

When I did my CLM DART multi-ensemble (40 ensembles) runs, the problem I had is that the ensemble spread of photosynthesis is quite small. Please see attached figure. The ensemble spread is quite small that assimilating fluorescence has small impact on fluorescence (and fluorescence itself), as the spread is small to begin with. BTW, I tried adaptive inflation – a small inflation factor does not change the spread much, but a large inflation factor will cause issue in the model.

The model (at a single point) was spunup for 200 years using CAM/DART atmospheric forcing. Then 40 ensembles are driven by 40 ensemble atm forcing.

So my question is that whether there are other ways to increase the model spread? How did you do in your model runs? Thanks a lot!

Looking forward to your reply!

Best, -Xi

The first question is how long you have run the model for with the ensemble of drivers prior to attempting the assimilation? This is a topic that requires further work to obtain a defensible approach, but I typically "spin" the model with the ensemble of climates for anything from 5 to 50 years to obtain larger initial spread.

The alternative way to obtain/maintain spread is to use the inflation - either fixed or adaptive - within DART. Have you done this? I have found this is required when using real MODIS observations.

Thanks for the prompt reply!

I have run the model at a single point in Harvard Forest with 40 ensemble atmospheric forcing for 200 years before the assimilation. Thus the initial conditions for each ensemble member should be different. Then during the assimilation, the 40 ensembles were driven by 40 ensemble atm forcing.

I've also tried adaptive inflation on the prior with different levels of inflation factors. One with 1.03 and the other 1.5. Please see attached figures. This is 10 days run and we are looking at leafn. Clearly there are some inflation going on there. 1.5 case might have blown up the ensemble, which is not what we want (is that correct?)

The issue with the ensemble is that the spread does not reflect the real uncertainties in the land model (e.g., like in Parazoo et al. GCB. 2014). I have some idea that might work. I will test it and keep you all updated.

This is the setting in input.nml on inflation (it is in /glade/p/work/xiyang/cases/Ha1_CLM45BGC_trunk_40ensemble_assimilate_inflation for your reference). I only changed the inf_initial and inf_flavor. All other paramters are default.

inf_flavor = 2, 0
inf_initial_from_restart = .false., .false.
inf_sd_initial_from_restart = .false., .false.
inf_output_restart = .true., .true.
inf_deterministic = .true., .true.
inf_in_file_name = 'prior_inflate_ics', 'post_inflate_ics'
inf_out_file_name = 'prior_inflate_restart', 'post_inflate_restart'
inf_diag_file_name = 'prior_inflate_diag', 'post_inflate_diag'
inf_initial = 1.03, 1.0
inf_sd_initial = 0.6, 0.6
inf_damping = 0.9, 0.9
inf_lower_bound = 1.0, 1.0
inf_upper_bound = 1000.0, 1000.0
inf_sd_lower_bound = 0.6, 0.6

My understanding is that by setting inf_sd_initial to a positive number, it is adaptive inflation. However, I agree that the inflation factor did not change during the 10 days' test – as you can see from the attached figure for the both 1.03 and 1.5 cases. The inflation factors are 1.025 and 1.45 all the time. Did I do something wrong? Should I change something else?

The CLM variables impacted by the assimilation here are leafn and leafc, which are prognostic variables that affect the LAI and Vcmax, which affect photosynthesis and transpiration, and also SIF.

No observation has been rejected. I will make a few figures later regarding the relationship between SIF and GPP etc.

Thanks for the help! I am running more tests, and maybe if you all have time next week we can have a skype meeting?

The plots you show indicate that the inflation is not changing through time. The namelists you are using explain why.

inf_initial_from_restart = .false., .false.
inf_sd_initial_from_restart = .false., .false.

These are the correct settings FOR THE FIRST ASSIMILATION CYCLE ONLY. After that, the inflation values need to be read from the restart files. As it stands, with these values set to .false. the single inflation value from the namelist is constantly being used. Since you are using prior inflation, only the first entry needs to be .true. ...

So - AFTER the first step, you need to change the namelist to:

inf_initial_from_restart = .true., .false.
inf_sd_initial_from_restart = .true., .false.

Alternatively -

Since you have inflation files already ... you can rename one of them to be "prior_inflate_ics" and keep inf_initial_from_restart, inf_sd_initial_from_restart set to .true. for the duration of the experiment. The DART scripting should take the output inflation file from the previous cycle and rename it to be used as the input for the subsequent cycle.

I have also looked at the plots you sent . The assimilation will have more of an impact if SIF and leaf[c,n] are correlated. The scatterplots on page 4 indicate a very low correlation. Are there other CLM variables that are more correlated with photosynthesis?

Looking at page 6 ... the free run looks REALLY GOOD ?!

At this point, I am out of my element. If SIF is strongly dependent on shortwave radiation - do we have enough (any?) variability in the shortwave radiation forcing field? If all the ensemble members are getting the same shortwave forcing, then it is not surprising that the free run looks good. There's not much to be done about that. It looks to me like the system is nearly entirely determined by the shortwave forcing.

Thank you for the reply! They are really helpful!

As regard to the inflation, now I think I can correctly implement adaptive inflation. I will give it a shot this afternoon.

Jung-Eun (cc'ed here) and I have been discussing about the atmospheric forcing. Attached is a figure of the atmospheric forcing (DART/CAM ensembles) at Harvard Forest. As you can see, the spread of shortwave radiation is small.

The top three uncertainties in land surface models are 1) atmospheric forcing; 2) structure (missing processes); and 3) parameter (like Vcmax – it is pretty much a fixed value in CLM. It is related to leaf nitrogen, but in the code it is not related to leafn at all, so the spread in leafn won't affect Vcmax at all). (Huntzinger et al)

Would a larger spread in atmospheric forcing help? I was thinking about instead of using the ensemble forcing from one single year, I use atmospheric forcing from different years – that will increase the spread of shortwave radiation. For example, instead of using all 40 ensemble forcing from year 2010, we use 4 ensemble from year 2001, 4 from 2002, 4 for 2003 ... etc. The seasonal pattern will still be the same, but the day-to-day variations are quite large. This is actually similar to what some groups are doing: when they do DA, the prior is from multiyear mean (Parazoo et al., 2014).

And you are right that the system is pretty much determined by shortwave radiation, as currently there are small spread in leafc that can cause a large spread in LAI during the spring and fall, and no spread in Vcmax at all. Those two are closely related to photosynthesis – though it should be noted that they are not linearly correlated. I am working on 1) increase the spread of leafc in the initial conditions (maybe use initial conditions from restart files at different years?); 2) increase the spread of leafn, and link leafn with Vcmax.

If you have any suggestion on the plan, please let me know! Thank you!

I would like to see the results of the inflation test before we talk about shuffling the forcing. There is actually more spread in the shortwave than I was expecting. Granted, some times have very little spread, but overall ... I don't know what a realistic spread should look like for shortwave. That's not my area of expertise.

Hi Tim,

I successfully implemented the adaptive inflation. I started with an inflation of 1.03, and this inflation factor changes during the assimilation test that lasts 30 days. Attached please see the figure of the time evolution of the inflation factor – it first goes down and then starts to come back up in the end of the run. I am testing higher inflation factors such as 1.5.

The spread of leafc is maintained. Please see the second figure. Blue lines are the ensemble members after the assimilation. There is no obvious convergence towards to the truth: likely because there is no leaf yet. I will run the model longer.

The spread of leafc, we think, is small to begin with. Thus the freerun is quite close to the truth already...

I will update you with more results, meanwhile please let me know if you have suggestions. Thank you!

I am happy that you are getting inflation values that change over time. It will be interesting to see what happens where the plants have really started to grow. You should read the Tellus paper on adaptive inflation - there is a relationship between the ensemble spread and the observation error variance that is important.

Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus A, 61, 72-83. doi: 10.1111/j.1600-0870.2008.00361.x

NCAR / DART

assimilating satellite solar-induced fluorescence (SIF) into CLM4.5 #40