OGGM / oggm

Open Global Glacier Model
http://oggm.org
BSD 3-Clause "New" or "Revised" License
217 stars 104 forks source link

Running from GCM #469

Open anoukvlug opened 6 years ago

anoukvlug commented 6 years ago

After some off-line discussions, with @fmaussion and @nchampollion, about how OGGM can be forced with different GCM's, I thought it would be good to open an issue about this.

Currently temperature and precipitation of CESM simulations can be used to force OGGM. Different GCM's structure their output slightly differently. Therefore the process_cesm_data function can currently not process files from other GCM's. There are a couple of ways to make it possible to use other GCM's in OGGM.

Here follows a list with some options (there are probably more options), to continue the conversation:

  1. Make the process_cesm_data function more flexible. This would require quite some keyword arguments in the function.

  2. Add different functions for the different GCM's.

  3. Write documentation on how people can make their own function to treat their input data.

  4. Make a general process_gcm_data function, with documented how the input should look like. This could than include some documentation on how to make these basic changes to the input (netCDF) files by using NCO tools (e.g. summing different type of precipitation to get a file with the total precipitation or changing variable names).

Disadvantages of the different options:

  1. There might be many things that differ between the models. To make one function that can treat all GCM's might be hard and I am afraid that it could be sensitive to bugs.

  2. This adds a lot of code repetition to the model.

  3. This makes people repeat what other people have already done.

  4. It puts problem of having difference in how the GCM's structure their output with the user.

fmaussion commented 6 years ago

Very good points! I will spend the week-end at the ECMWF hackathon, and I might come back with some ideas. Let's keep in touch!

nchampollion commented 6 years ago

Yes, good points ! I'll also come later to this issue but my first thought is that a mix between options 1 & 3 seems reasonable. Solution 2 not feasible since GCM landscape change over time ... and solution 4 is probably the best solution but require lot of time and much more complex.

fmaussion commented 6 years ago

One of the first thing I am thinking about changing is the computation of the anomalies. Currently the anomalies are computed by the process_cesm_data but I would like to have a task especially for this.

the reasoning is that this part of the code is very specific to our way to compute the mass-balance and also the same for any kind of data we are using. The step which is difficult to standardize is the input (gridded data -> time series at the glacier location): after this is done (i.e. by an external tool like Hugues did), OGGM should provide the rest of the tools

fmaussion commented 6 years ago

After giving this some more thoughts, I think we should not rush into "very general solutions" for now, simply because we don't know yet how this solution would look like. I asked Ben to provide me with some CMIP data so that I can have an idea how they look like. In the mean time (and this might change!!!), I suggest following:

@anoukvlug , thoughts?

anoukvlug commented 6 years ago

@fmaussion, I agree with you that it would be good to split the function in 2 tasks and that there should be documented how user can other data. I have only some remaining questions:

fmaussion commented 6 years ago

What is a climate_io?

currently the process_cesm function is available in the climate.py module. I would like to make a separate climate_io.py module for this function and all the others that will follow. IO stands for input/output. Another name could be climate_backends.py, but I don't know if it's more expressive. Do you have an idea?

Could you elaborate on your 3 point? I am not sure what you mean by computing the anomalies at the mass-balance model level.

Currently the process_cesm function does two things: (i) extracting the time series and (ii) computing the anomalies to CRU during a reference period.

The point (ii) is very general and not climate data specific. The PastMassBalanceModel could easily do it: when reading the climate data from any file, it would start by computing the anomalies first and then use the anomaly data as forcing.

In the end it will be the exact same output, just the place where the operations happen will be different. Furthermore, it would allow to test different reference periods for the anomalies quite easily.

Is it clearer now?

anoukvlug commented 6 years ago

Yes, it is clear now. Thanks for explaining.

So far the only other names I thought of were: climate_prepro.py or gcm_prepro.py. I think climate_io.py could also work, I only didn't know before that it stands for input/output.

I think that I would prefer to keep the anomaly calculation outside of the PastMassBalanceModel and have the anomalies being calculated in a separate function. My main concern is the following: Currently I create my cesm_data files outside OGGM, because I use the ensemble mean as climatology of the CESM-LME. Now I can easily use these files in the PastMassBalanceModel. I am not sure if that will till be possible when including the calculation of the anomalies in the PastMassBalance model. A minor concern is the computation time. I guess it takes quite some time to calculate the anomalies. Once you have the climate you can re-use them and not redo this step more often than needed.

fmaussion commented 6 years ago

Now it's my turn to not fully understand ;-) My motivation for computing the anomalies in the past-MB model was to increase flexibility, not decreasing it ;-). A simple workaround would be to make the computation of the anomalies optional, i.e. per keyword argument. Then it would be possible to drive the model with whatever the user want. The only drawback of this method is how to store the climate data: but here I have an idea: we could store the climate timeseries together with the model output or something.

The computation time is irrelevant here, because the anomalies are quickly computed and have to be computed once at initialization only,

anoukvlug commented 6 years ago

That takes away my concerns :) Still I do not fully understand the benefits of having this as a part of the PastMassBalanceModel. It somehow doesn't feel intuitive to me, but maybe that is just me being conservative regarding code changes ;) To me it seems like the same flexibility can be created when there is a separate function for calculating the anomalies. When including this functionality in the PastMassBalanceModel it should also be added to the ConstantMassBalanceModel and RandomMassBalanceModel.

Indeed the climate could be saved with the model output. Which could be nice to have saved together, but I am not sure how easy it is to save yearly and monthly data together.

fmaussion commented 6 years ago

Good discussions! Thanks for keeping this up.

When including this functionality in the PastMassBalanceModel it should also be added to the ConstantMassBalanceModel and RandomMassBalanceModel.

This would happen automatically because ConstantMassBalanceModel and RandomMassBalanceModel both use PastMassBalanceModel internally. We will have to add a keyword though.

However, I am still open to suggestions (and of course I might change my opinion completely when trying to implement it!). How would your proposed function look like? And where would the anomaly data be stored, and how will you tell the mass-balance model to read it?

anoukvlug commented 6 years ago

It is indeed good to discuss :) I start to see now that moving the anomaly calculation to the PastMassBalanceModel would work.

Still, I think it would be easiest to store the data in the same way as it is stored now, so no changes would be needed in the mass balance model. There would be functions that extracts the time series in the climate_io. Than I am not sure what is the best to do after. I thought of a couple of different things and was wondering if the following structure would work:

A climate extraction function that feeds into the function that calculates the anomalies.

climate = climate_io.gcm_x(all the input it needs)
gcm_data(gdir, climate=climate, start_reference_period=1961, end_ref_period=1990, filesuffix='', etc.)

Another option would be that the function in the climate_io extracts the climate data, standardizes and saves it. The anomaly function would use this/these file(s) and save again.

sumnonpuella commented 6 years ago

@anoukvlug - I've been working on adapting OGGM to work with CCSM3 data (mostly adapting process_cesm_data, which fits in well with the larger goal of running OGGM with GCM data. Is this something you're still working on? If so, I'd love to get involved in the development of a broader GCM processing function, or even just contribute my notes on using different netCDF input file formats.

fmaussion commented 5 years ago

@sumnonpuella sorry I've forgot about this one.

This is high ou our priority list, we will discuss this soon

anoukvlug commented 5 years ago

@sumnonpuella, I am sorry for not replying earlier. (I was away on an expedition without proper internet access.) It is nice to see that you got in the mean time involved on this issue. To answer your question, I will try to make some progress on this issues during the OGGM hack day (today), though normally I am mostly working on running OGGM with the CESM climate as input.