Implementation of CLARA simulator

salomoneliassonSMHI commented 3 years ago

Hi,

I have been tasked with implementing the CLARA -simulator (Eliasson et al., 2020) into COSP if possible.

The way we made the simulator for offline implementation was very similar to the MODIS simulator.

The main difference to the MODIS simulator is how the cloud mask is simulated. The cloud mask relies on auxiliary data of the probability of detection (POD) as a function of optical depth, geographical location and whether or not the subcolumn is sunlit. The subcolumn is considered cloudy when a generated random number calculated for each subcolumn is smaller than the POD as read from the auxiliary data. The background to this approach is explained in the paper describing the simulator.

Our implementation does not require any additional model fields than already required by COSP simulators

Reference: Eliasson, S., Karlsson, K.-G., and Willén, U.: A simulator for the CLARA-A2 cloud climate data record and its application to assess EC-Earth polar cloudiness, Geosci. Model Dev., 13, 297–314, https://doi.org/10.5194/gmd-13-297-2020, 2020.

salomoneliassonSMHI commented 2 years ago

Hi,

As made clear at the COSP PMC meeting on June 17, the probability of detection (POD) used to simulate the CLARA cloud mask should not depend on lat/long and instead depend on other surface and atmospheric -variables available to COSP.

For this, I will recompile the POD statistics as functions of the relevant input variables to COSP. However, assuming we cannot use the input variables to RTTOV, it seems the only potentially useful parameters for POD are sunlit, land/sea mask, surface elevation, temperature (surface and layered), layered specific humidity (which I can convert to TCWV), and the optical input, emissivity at 11 microns.

I feel that this is likely not enough, though. Flags for sea/ice and snow may be useful. Are there any other surface parameters I could use, e.g., land use, vegetation etc.? - I suppose one problem is that they may be defined differently in different models?

Also, I do not fully understand why lat long and time are not input to COSP. Is this a technical or a philosophical question?

Thanks in advance for your help

Salomon

RobertPincus commented 2 years ago

Dear Salomon -

Input fields may be shared among simulators so there’s no reason you can’t use some of the fields requested by RTTOV. As a general principle, you’ll want to add code in COSP_SIMULATOR() to verify that all the required inputs are present if the CLARA simulator is to be called (which users communicate by allocating space for one or more output variables from the simulator). The comment in the source code is there because there are many fields which are used only by RTTOV and need not be supplied if RTTOV is not going to be called. I see now that we were not nearly as careful when integrating the RTTOV simulator as we are being with CLARA. Whether that’s a blessing or a curse I’ll leave to you to decide :). If you do decide to use some of the RTTOV inputs you might change the commenting to make this clear.

The desire to have inputs be in purely optical terms is “philosophical” although a better term might be “precise", for the reasons you mention. Describing the atmosphere and surface in terms of optical properties - spectrally-dependent emissivity, albedo, etc. - is unambiguous. COSP has no idea how models represent sea ice or land surfaces, but to interpret remote sensing measurements what one needs to know are the optical properties. Are you confident that you won’t be able to get accurate PODs using co-variations of surface temperature, emissivity, and albedo?

Robert Pincus https://crew.ldeo.columbia.edu/people/robert-pincus

On Aug 13, 2021 at 7:46:33 AM, Salomon Eliasson @.***> wrote:

Hi,

As made clear at the COSP PMC meeting on June 17, the probability of detection (POD) used to simulate the CLARA cloud mask should not depend on lat/long and instead depend on other surface and atmospheric -variables available to COSP.

For this, I will recompile the POD statistics as functions of the relevant input variables to COSP. However, assuming we cannot use the input variables to RTTOV, it seems the only potentially useful parameters for POD are sunlit, land/sea mask, surface elevation, temperature (surface and layered), layered specific humidity (which I can convert to TCWV), and the optical input, emissivity at 11 microns.

I feel that this is likely not enough, though. Flags for sea/ice and snow may be useful. Are there any other surface parameters I could use, e.g., land use, vegetation etc.? - I suppose one problem is that they may be defined differently in different models?

Also, I do not fully understand why lat long and time are not input to COSP. Is this a technical or a philosophical question?

Thanks in advance for your help

Salomon

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CFMIP/COSPv2.0/issues/59#issuecomment-898400664, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2GXCD62PYF7DV52Z7MXNTT4UAZTANCNFSM4Z7WSHGA .

salomoneliassonSMHI commented 2 years ago

Dear Robert,

Thank you for your fast answer. Since I can technically use the latitude and longitude in COSP, I will probably implement the CLARA simulator using the lat/long-based POD as a placeholder until we might replace it for a POD -dataset based on surface and atmospheric properties.

However, some of the advantages to using a POD dataset based on only cloud optical depth and geographical coordinates, POD(long, lat, tau), are that:

This represents the total POD, i.e., accounts for all variables that may affect the skill of the cloud retrieval at a given location and optical depth.
It does not depend on any other input data. Otherwise, POD as a function of co-varying parameters may add a layer of uncertainty in model comparisons since the POD distribution will vary from model to model.
Another introduced risk is from ancillary data itself needed to define such a POD dataset. Most of the ancillary model input to CLARA-a3 is from ERA5, and even though this dataset is excellent, for instance, the surface temperature will still be 'SKT according to ERA5' and not the actual surface temperature, we will be building in any biases into the POD dataset. We are unsure if we can adequately describe POD using SKT, emissivity, albedo, and other co-variables. However, we have not yet tried this approach.

On the other hand, there are some downsides to a lat/long-based POD dataset. For example, the surface and atmospheric conditions may have changed climatically in some areas during the period CLARA-Ax climate data record represents. There may also be problems related to how representative the data is; The POD data derives from collocations with Calipso from 2006--2015.

If I may, what are the main reasons for avoiding a long/lat-based POD dataset from a modelling perspective?

Cheers, Salomon

kgkarl commented 2 years ago

Hi! This is Karl-Göran Karlsson, responsible for the development of the CLARA data record in the EUMETSAT CMSAF project. It took a while to convince the CMSAF people that it would be a good idea to develop COSP simulators for CMSAF data records. However, I am glad that we have finally come to a point when such a simulator could potentially be implemented. However, there are some things in the most recent discussions here that forced me to post this long statement (sorry for all the words!).

The recommendation to express the probability of detection of clouds (POD) for the CLARA-simulator as a function of optical parameters, such as solar zenith angles, viewing angles, surface albedo, surface temperature and surface emissivity, instead of as a function of geographic positions (latitude and longitudes) has led to an internal debate here at SMHI. The discussion is very specific and about how the CLARA-simulator shall describe the achieved cloud detection performance. However, it turns also out to be very much linked to core questions like,

“What’s the purpose of a satellite simulator?"

and

"What is the simulator supposed to do when mimicing a satellite-derived data record?”

In my opinion, what we have proposed (i.e., the latitude-longitude approach) is a pioneering attempt that better describes the performance in cloud detection of a satellite data record. Notice that any scientist using a COSP simulator in their CMIP (or other) studies comparing model results with satellite- derived results should also consider how well the referenced satellite data record describes real cloud conditions (e.g., based on earlier validation studies). As far as I understand, the only way the COSP simulators have assisted in this particular task has been to use of a constant cutoff value for clouds with an optical thickness less than 0.3 (at least true for the ISCCP and MODIS simulators). That is all (please correct me if I am wrong). All other corrections and adaptations ensure that the simulated clouds based on the model atmosphere emulate those observed from the space platform based on the viewing geometry, solar illumination and radiation aspects as closely as possible. Therefore, when using COSP-simulations, results from existing validation studies should be considered in an additional analysis step before making final conclusions from the simulated and observed cloud information.

For these reasons, the CLARA simulator introduces a POD dependence to the simulator results, giving the probability of detection of a cloud of a certain vertically integrated optical thickness at a specific geographic location. Thus, the new method should help modellers judge the credibility of their reference satellite-derived data record even if exclusively based on cloud information provided by the CALIPSO-CALIOP sensor (currently the best reference, I think).

I hope we can all agree that this could be an interesting and useful addition to the simulator concept. However, the proposed way of implementing this is problematic. It seems the COSP community prefers us to express PODs as a function of optical parameters rather than by geographical coordinates as we have done up to date. This raises two serious questions or problems for us:

It would force us to go back and express our PODs as functions of several parameters for which we do not have access to the absolute truth. For example, we have only access to estimations valid for longer periods (e.g., months) for surface emissivities and surface reflectances which means that we will be very sensitive to changes occurring on shorter time scales (e.g. snowfall events or melting). Furthermore, even a fundamental parameter such as surface temperature from reanalysis datasets has clear limitations for critical situations (e.g., very cold wintertime conditions). In conclusion, we are not sure if we will get results similar to the safer solution based on a general POD performance for a specific location derived from all available matchups with CALIPSO observations.
If we were to parameterize our PODs, the simulator would then be able to change the CLARA PODs (for a particular geographic location) if the model-simulated atmosphere deviates from the atmospheric conditions that prevailed when we initially estimated the CLARA PODs. However, would not this be contra-productive if the purpose of a COSP simulator is to facilitate the comparison to existing satellite-derived climate datasets? In other words, the simulated results from this type of simulator should actually not be compared to climate datasets for the current climate but to results from a hypothetical climate corresponding to the climate described by the actual climate model. Thus, I cannot see how the use of this kind of simulator facilitates the comparison of results with the currently existing CLARA data record.

In conclusion, we are worried that the work with the change to a parameterized version of the CLARA PODs is a huge task with an uncertain outcome and that the actual purpose with the simulator might be compromised in a way that does not facilitate comparison of model- generated results with the original CLARA data record.

I ask you to consider these aspects further. For us, to stay with expressing PODs as functions of geographic position is a way to be true to our original CLARA validation results. The reasons why we should not use positional information has not been clearly explained to us. The positional information itself, i.e. the relation between geographic position and model grid point, must be there somewhere but maybe there are some technical limitations which we do not understand perfectly. I hope it will be possible to find a solution that does not require large structural changes of the code.

/Karl-Göran

RobertPincus commented 2 years ago

Dear Karl-Göran -

It's nice to hear from you and thanks for expressing your concerns. Taking the liberty of speaking for the COSP committee the use of the probabilities of detection in the simulator seems like a very interesting approach and one we're excited to see explored in COSP. As Salomon knows it is indeed possible to use geographic information in the simulators and it sounds as if that will be your first step.

The question of how best to express a dependency in a simulator - in you case, the dependence of probability of detection - does indeed depend on what one hopes to achieve with the results from the simulator, which are often though not only the comparison of a model against an observational data set. Salomon explained that your PODs in the observational processing were tabulated with respect to geographic location and so it's clear why this is attractive.

What may be less clear is why, from a modeling side, this isn't ideal, and so why you might aspire to refine the approach in future iterations. At least two practical points came up in the COSP committee (perhaps others will chime in):

Models may well have a different spatial resolution than does the observational data set. If the model's resolution is higher than the observations, should the larger-scale PODs be used regardless of the smaller-scale conditions? If the model resolution is coarser than the observations, does one average the PODs? Sample them?
One wants to be able to sort out biases in cloud predictions from other biases in the model; referencing results to geography potentially means that biases in one field can be aliased into another. Imagine a regions in which clouds are always thin enough that the POD has a big impact. Imagine also that the clouds are more easily detected over surface type A (maybe bare ground) than surface type B (say, snow or land ice). If the observations are made in a world in which the surface type is always A but a model mistakenly calls it type B, cloud occurrence will be wrong from the simulator for reasons that don't have to do with the clouds, but with the surface.

These are two practical reasons why it seems more robust to specify simulator behavior as a function of conditions, any why this behavior was built into COSP 2 (although the RTTOV simulator doesn't follow this guidelines). The decision builds on the conceptual understanding that the behavior of retrievals doesn't depend explicitly on location.

It is true, as you say, that an attempt to reverse-engineer the parametric dependence of your PODs may be less effective than expressing them geographically. It would be interesting to see how much less effective given Salomon's ideas about what variables would be good predictors. Were the differences to be large you would have learned something interesting.

Robert

kgkarl commented 2 years ago

Dear Robert,

Thanks for providing your thoughts about the pros and cons regarding parametric vs positional expressions of the PODs. I'll try to be short in this reply.

Regarding point 1, I don't see why the CLARA simulator is different in this aspect. As you write, "Models may well have a different spatial resolution than does the observational data set". This should be true for any data set you try to simulate and you simply have to do what you can to compensate for this. So, yes, the alternatives are averaging or sampling to get closer to the observations.

Point 2 is more linked to my main objection on going the parameterised way: Should you really in a climate simulation also try to simulate the behaviour in a (potentially) changed climate of a specific climate data set from observations? Because this is what happens if you link the PODs to other model parameters (which may be different from the climate you are supposed to simulate). I think this is to go too far and does not help you in the interpretation of the results. I mean, in the end the idea should be to compare simulations with the observational data set, shouldn't it? So, why introduce things that kind of indicate that you do not trust the reference observational data set any longer?

Your example with a surface that has changed from e.g. bare ground to land ice can indeed also be used as an argument against a parameterised approach. If this change is wrong (i.e., it did actually not happen during the observational period of CLARA) you are then likely to see a discrepancy in the simulated cloudiness over these positions if you use the positional approach in describing POD. I would say that this could give you an important indication that something is wrong in your simulation. Exactly what is wrong has to be deduced from further studies. But if you use a parametrised approach, you can (if you are unlucky) change the POD in a way that this discrepancy is not seen any longer. Thus, you risk to loose the direct relation with the observed CLARA climatology and this will not help you at all when then later comparing with the CLARA climatology. This is my main objection for going this way. I mean, do you do something similar in any other COSP simulator, i.e., changing the reference behaviour of the referenced dataset because of a potentially changed simulated climate? This should never be done in my opinion.

However, I am not completely rejecting the idea of trying to express PODs in an alternative way. But I have other arguments for doing that than what you have proposed. Our problem is that the validation by use of CALIPSO only covers about 10 years (2006-2015) of the full CLARA climatology (1982-2015 or 1979-2020 in the upcoming third version CLARA-A3). Consequently, there is an uncertainty if our derived PODs are representative for the full CLARA period or not. If we are able to decently describe our PODs in a parameterised way, maybe we could get a better view on whether the PODs are valid for the full period or not. In the latter case we then have to use a more sophisticated time-dependent description of the PODS.

I hope I have made my view even clearer with this post.

Best regards

Karl-Göran

RobertPincus commented 2 years ago

Dear Karl-Göran, dear Salomon -

A reminder that these email threads appear on the public COSP Github site where most readers don't speak Swedish.

I wonder if the disagreement here comes about because of somewhat different ideas about the goals of a simulator. I understand your primary motivation as "comparison to a particular set of observations" and specifically the CLARA dataset. From the modeling side COSP is considered a way to "predict the observations that would be made given a (climate or weather) simulation." Comparison to observations in the present day is one important goal but COSP has been used, for example, to predict when signals of climate change will be detectable. Decisions that tie a simulator to the present-day restrict the scope in which the output may be useful.

The encouragement to make the PODs functions of parameters rather than location follows from this thinking, which is adopted by all simulators within COSP with the exception of RTTOV. The decision affects the synthetic observations from each model but not any observational datasets.

We could continue to work through some of the technical issues but they may not be important. To reiterate, the COSP committee is willing to include an implementation of the CLARA simulator in which cloud detection is tied to geography, but we encourage you to consider more flexible approaches in future.

Robert

salomoneliassonSMHI commented 2 years ago

Hi Robert,

Thank you for your answer again, and sorry for accidentally sending my email meant for Karl-Göran to the entire COSP community... Thanks also for the example where a study used simulations in the future.

The first implementation of PODs in the CLARA simulator will be based on lat long as we have discussed. Still, we will consider reconfiguring the POD tables to be based on the atmospheric states when we generate POD data for the upcoming release of the CLARA-A3 dataset. Hopefully, we can achieve equivalency with both types of POD distributions.

Cheers, Salomon

alejandrobodas commented 2 years ago

Dear All,

Apologies for my late contribution to the discussion, I've been away for an extended period of time. Thanks for the interesting discussion, it's been very useful to understand the reasons behind the definition of the POD distributions as function of geographical location. I won't add anything new to the discussion, but I would like to echo Robert's point about the importance of considering a broader motivation for the simulator. Having PODs based on radiative properties and/or atmospheric state will expand the range of applications of the CLARA simulator, even if the interpretation of the results is more uncertain/ambiguous in some cases. Salomon, I'm happy to hear that you are willing to explore alternatives in the future. Bearing this in mind, I think it would be useful to implement the simulator in a way that is future-ready, i.e. allowing for a flexible definition of the POD distributions (or with a design that allows to easily do this in the future), even if in this initial implementation only a single definition of the POD distributions is available.

Regards, Alejandro

kgkarl commented 2 years ago

Dear all,

Thank you, Robert, for having clearly identified the reason for the frustration and debate lately. I think you are absolutely correct. It is all about what you want to achieve with the simulator. For me (responsible for the CLARA data record), I certainly want the simulator to be a tool to assist modellers in the comparison with CLARA. But I can understand that you may also want to see the simulator as a tool for enabling a modification (or prediction, if that is a better word) of how also satelite-based observations may change in the future when simulating a different climate. My concern here is that it could be a problem to achieve both goals with one and the same implementation.

The idea of introducing a dynamically changing probability of detection for a certain simulator and observation data record must be pretty new to COSP. At least, I do not know if you have any other simulator that tries doing that. You write ".. adopted by all simulators within COSP..." but in this case I don't think it is entirely true. So far, the simulations of how data records may change is done but without touching a data records ability to detect clouds, I think. So, we have a completely new situation to relate to. Consequently, this discussion is important. Is it really possible to manage both goals with just one single implementation? I can agree to the view that the concept with a fixed POD tied to lat/lon is not appropriate if you want to simulate the evolution of the data record in a changed climate. But it is at the same time the safest way to deal with the task of comparing with the actual CLARA data record since it would then be true to original validation results.

To switch to a parameterised POD approach is clearly a preferred way to go for the future "predicting mode" but I am still very sceptical to if this is a good solution for the basic "inter-comparison mode" where you just want to compare with a existing data record of the observed climate. Your example with potentially changed surface characteristics in the simulation highlights this and I claim that this forces you to actually compare "apples with pears". If you instead would rely on just what the validation data previously had told you (from the lat/lon description of POD) I would still think that a comparison would be valid since we are then just comparing the actual cloud information from the two data sources and not taking into account any other changes of model parameters which are not cloud parameters. If we then get a discrepancy, further studies have to reveal why and they may very well then lead to the conclusion that a lot of it (but probably not all) is linked to changed surface conditions. I fear that these discrepancies from the current climate may be seriously changed or even disappear if you go for a parameterised representation when comparing with an existing climate data record. Then I wonder, what's then the use of this comparison?

So, this is pretty much a philosofical discussion. We have no previous experience of a simulator that simulates a changing cloud detection efficiency and that's why this discussion is important.

If you demand us to follow exclusively the parameterised approach you really put us to a test. The reason is that we are not absolutely sure that we have enough data or even data that are good enough for developing this parameterisation. Only future (and access to available resources) can tell. Also from this perspective I hope that you can allow us to start with the safer lat/lon approach.

We foresee a better situation in a couple of years from now after we have also introduced the use of VIIRS-data in the CLARA dataset. The ISCCP team and NOAA is currently creating something that is called the VGAC data set. This is nothing else than the AVHRR GAC dataset simulated from high resolution VIIRS data. This will be used not only by the CMSAF but also for extensions of the ISCCP dataset and the PATMOS-x dataset. We will then adapt our methods to VGAC data and add this to the CLARA data record (to become CLARA-A3.5 with a tentative release in 2025). The good thing is that this will allow global matchups between VIIRS and CALIPSO for the full period of 2012 (launch of Suomi-NPP) to 2023 (i.e., if CALIPSO survives that long). This would then increase the available amount of global matchups with a factor of 2 (well, maybe even up to a factor of 4 if we relax the allowed time difference of 2.5 minutes further to perhaps 5 minutes). I think this improves the prospects also for developing a parameterised approach since we are currently stretched to the limits of the content of the current global matchup dataset.

In conclusion, we will make an attempt to see if we can find a parameterised solution but it could be that the prospect for this will be better if we wait a few years to get more matchups to "play with". Until then I hope that the position-based POD approach can still be a useful start.

Best regards

Karl-Göran

klein21 commented 2 years ago

Dear Karl-Göran and Salomon,

Thanks very much for all of your writings through which I learn new things, and your willingness to discuss the relevant issues and potentially consider some changes in the future. I certainly hope you would consider the changes as it would help with decadal detection of cloud trends some of which will arise due to the very significant decreases in sea-ice and snow coverage that have happened in recent decades and are expected to continue.

In my viewpoint, I suspect that the POD is primarily a function of the surface type (perhaps through surface reflectance) and the fact that clouds may differ significantly between open ocean, ice-free land, and snow/ice covered surfaces – and not the latitude and longitude per se. Certainly when I look at Figure 1 of your paper (10.5194/gmd-13-297-2020), I would think that knowing the surface reflectance (which would distinguish between open ocean, ice-free land, and snow/ice covered surfaces) would be capable of explaining a very large fraction of the variance in POD across the globe. (Of course, you need different PODs for day and night – which is acceptable to specify in the simulator since that corresponds to whether visible imagery is used to detect clouds.)

Thus I am optimistic that a POD written in terms of these variables would be successful.

Also it is worth saying that we have always viewed COSP as answering the question “What would the satellite have retrieved for clouds, if the real world had the clouds (and other boundary conditions) of the model?” From this point, if one believes that the primary physical reason determining the POD has to do with the surface type via surface reflectance (as cloud detection is well-known to be sensitive to surface reflectance – to list one physical explanation), then it would be preferable to express POD in terms of these variables.

In the end, I think this really comes to down to how confident you know the underlying physical reasons determining the POD. Maybe I am too confident to think it is probable that the POD is related to physical variables such as surface reflectance that a model has. And maybe you think there is physical reason why latitude and longitude determine POD. I am aware that latitude would be a proxy for the solar zenith angle which might affect cloud retrievals, although latitude alone doesn’t seem to the leading factor explaining spatial variance in POD when I look at Figure 1. Also, I can’t think a physical reason why longitude itself would affect the POD. But perhaps you can? Steve

From: Karl-Göran Karlsson @.> Reply-To: "CFMIP/COSPv2.0" @.> Date: Wednesday, September 1, 2021 at 5:02 AM To: "CFMIP/COSPv2.0" @.> Cc: Subscribed @.> Subject: Re: [CFMIP/COSPv2.0] Implementation of CLARA simulator (#59)

Dear all,

Thank you, Robert, for having clearly identified the reason for the frustration and debate lately. I think you are absolutely correct. It is all about what you want to achieve with the simulator. For me (responsible for the CLARA data record), I certainly want the simulator to be a tool to assist modellers in the comparison with CLARA. But I can understand that you may also want to see the simulator as a tool for enabling a modification (or prediction, if that is a better word) of how also satelite-based observations may change in the future when simulating a different climate. My concern here is that it could be a problem to achieve both goals with one and the same implementation.

The idea of introducing a dynamically changing probability of detection for a certain simulator and observation data record must be pretty new to COSP. At least, I do not know if you have any other simulator that tries doing that. You write ".. adopted by all simulators within COSP..." but in this case I don't think it is entirely true. So far, the simulations of how data records may change is done but without touching a data records ability to detect clouds, I think. So, we have a completely new situation to relate to. Consequently, this discussion is important. Is it really possible to manage both goals with just one single implementation? I can agree to the view that the concept with a fixed POD tied to lat/lon is not appropriate if you want to simulate the evolution of the data record in a changed climate. But it is at the same time the safest way to deal with the task of comparing with the actual CLARA data record since it would then be true to original validation results.

To switch to a parameterised POD approach is clearly a preferred way to go for the future "predicting mode" but I am still very sceptical to if this is a good solution for the basic "inter-comparison mode" where you just want to compare with a existing data record of the observed climate. Your example with potentially changed surface characteristics in the simulation highlights this and I claim that this forces you to actually compare "apples with pears". If you instead would rely on just what the validation data previously had told you (from the lat/lon description of POD) I would still think that a comparison would be valid since we are then just comparing the actual cloud information from the two data sources and not taking into account any other changes of model parameters which are not cloud parameters. If we then get a discrepancy, further studies have to reveal why and they may very well then lead to the conclusion that a lot of it (but probably not all) is linked to changed surface conditions. I fear that these discrepancies from the current climate may be seriously changed or even disappear if you go for a parameterised representation when comparing with an existing climate data record. Then I wonder, what's then the use of this comparison?

So, this is pretty much a philosofical discussion. We have no previous experience of a simulator that simulates a changing cloud detection efficiency and that's why this discussion is important.

If you demand us to follow exclusively the parameterised approach you really put us to a test. The reason is that we are not absolutely sure that we have enough data or even data that are good enough for developing this parameterisation. Only future (and access to available resources) can tell. Also from this perspective I hope that you can allow us to start with the safer lat/lon approach.

We foresee a better situation in a couple of years from now after we have also introduced the use of VIIRS-data in the CLARA dataset. The ISCCP team and NOAA is currently creating something that is called the VGAC data set. This is nothing else than the AVHRR GAC dataset simulated from high resolution VIIRS data. This will be used not only by the CMSAF but also for extensions of the ISCCP dataset and the PATMOS-x dataset. We will then adapt our methods to VGAC data and add this to the CLARA data record (to become CLARA-A3.5 with a tentative release in 2025). The good thing is that this will allow global matchups between VIIRS and CALIPSO for the full period of 2012 (launch of Suomi-NPP) to 2023 (i.e., if CALIPSO survives that long). This would then increase the available amount of global matchups with a factor of 2 (well, maybe even up to a factor of 4 if we relax the allowed time difference of 2.5 minutes further to perhaps 5 minutes). I think this improves the prospects also for developing a parameterised approach since we are currently stretched to the limits of the content of the current global matchup dataset.

In conclusion, we will make an attempt to see if we can find a parameterised solution but it could be that the prospect for this will be better if we wait a few years to get more matchups to "play with". Until then I hope that the position-based POD approach can still be a useful start.

Best regards

Karl-Göran

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/CFMIP/COSPv2.0/issues/59*issuecomment-910216790__;Iw!!G2kpM7uM-TzIFchu!msanpfNAmwCGncyZqgZmRiKo9rzr2pPPTeBnOX-cLkFfGveMeRLvkieGBS2ZIQT6GQ$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AHZ364D2ZKCZ4FFJXSGGJ2TT7YI6DANCNFSM4Z7WSHGA__;!!G2kpM7uM-TzIFchu!msanpfNAmwCGncyZqgZmRiKo9rzr2pPPTeBnOX-cLkFfGveMeRLvkieGBS0S3fH3ug$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.us/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!G2kpM7uM-TzIFchu!msanpfNAmwCGncyZqgZmRiKo9rzr2pPPTeBnOX-cLkFfGveMeRLvkieGBS0GBn7lgA$ or Androidhttps://urldefense.us/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!G2kpM7uM-TzIFchu!msanpfNAmwCGncyZqgZmRiKo9rzr2pPPTeBnOX-cLkFfGveMeRLvkieGBS3B7Ex9cg$.

kgkarl commented 2 years ago

Dear Steve,

Thanks for your contribution to this discussion.

I do understand the arguments for trying to get away from the position-based (latitude/longitude) description of POD but I think I need to explain a bit further why we cannot drop completely the connection to the lat/lon context.

Our estimation of PODs is based on the opportunity to do matchups between AVHRR observations and CALIPSO observations where satellite orbits are crossing, i.e., at points with simultaneous nadir observations (SNOs). Thus, these SNOs are made at certain geographical positions (lat/lons). The trick is that when doing these matchups for AVHRR-carrying satellites having approximately the same orbit as CALIPSO (i.e., an afternoon orbit with Equator crossing time near 1:30 pm) it is possible to get those matchups made globally (only excluding the area very close to the poles). So this is what we did for satellites NOAA-18 and NOAA-19 in the time period 2006-2015 (before serious orbital drift affected these satellites separating them from the CALIPSO orbit). All results were then compiled and gridded to form a global description of PODs. This is all described in the following paper: https://amt.copernicus.org/articles/11/633/2018/

So, the baseline for our description of PODs is this dataset described in a geographical grid. This is our "truth" that we always must relate to (e.g., when comparing the CLARA-A2 results to any model simulation of cloudiness for the same period).

The idea to interpret the PODs as functions of surface (and/or other) parameters is interesting and certainly the only option if we want to predict how PODs might change in the future (as simulated by climate models). But it has both pros and cons. It means that if we notice a change in simulated cloudiness (with respect to the cloud climatolgoy of the original CLARA-A2 dataset) we must then first understand what comes from changing PODs and what comes from real changes in cloudiness. Thus, there is an increasing complexity in the understanding if we really have a change in cloudiness. But, it also offers a tool to investigate how much of this change is related just to e.g. changes in surface properties.

I think it is a bit optimistic to believe that most of the POD variation is linked to the surface reflectance in visible channels. In my opinion, the strength of the CLARA-A2 dataset is very much linked to the AVHRR information given by the spectral information of the shortwave infrared channel at 3.7 micron (AVHRR Channel 3b) and how this information relates to the information in infrared channels 4 and 5 (split window channels at 11-12 microns). For example, the 3.7 micron channel is crucial for the detection of clouds over snow and ice surfaces. That's why I believe that CLARA-A2 could be very useful for studies of the Arctic and Antarctic regions during the polar summer whereas the information during the polar winters is rather poor (as shown by low PODs in the polar winter). In conclusion, I would rather expect a better correlation of PODs with surface temperature and 3.7 micron surface emissivity information than with surface reflectance in the visible AVHRR channels.

An obstacle to jump over is also the fact that the underlying cloud detection method in CLARA-A2 has already used some critical auxiliary datasets like surface emissivity (from MODIS) and surface temperature (from ERA-Interim). Thus, to relate PODs to the same kind of datasets does not provide a completely independent and objective view of how PODs vary. How to handle this is not trivial.

Finally, I want to return to the fundamental simulator statement " “What would the satellite have retrieved for clouds, if the real world had the clouds (and other boundary conditions) of the model?” This is, of course, well-known to me even if the parenthesis about boundary conditions has not been mentioned so often earlier. I mean, mostly the geometric viewing aspects from the space platform plus the vertical profiles of various atmospheric parameters have dominated simulator formulations, haven't they? Or are there simulators that also address changes of e.g. surface conditions? Please tell me in that case. For me, it seems as the CLARA-simulator is sort of the first simulator also addressing the changing surface conditions for simulators describing datasets from passive imagers. As far as I know, neither the ISCCP nor the MODIS simulator is doing something to the potential change in the cloud detection ability. For example, I would think that also the MODIS simulator should behave in a similar way as the CLARA dataset (although with slightly higher PODs). We actually made a study some years ago when we could see the same kind of variation in PODs for MODIS as for the AVHRR methods over the Arctic region (see https://acp.copernicus.org/preprints/9/16755/2009/acpd-9-16755-2009.pdf). So, maybe the CLARA PODs can also (in some way) be adapted also for other datasets (mainly MODIS, PATMOS-x or ESA-CLOUD-CCI datasets). However, for ISCCP, I don't know. The situation is quite different for the bi-spectral VIS-IR methods and I haven't seen any similar evaluation of ISCCP results based on intercomparisons with CALIPSO. Someone should do that, I think (or maybe I overlook already existing studies?).

You mentioned that you expect changes in cloudiness due to decreasing ice in the Arctic and that this would be an interesting topic to study in the near future. In fact, we have already seen this happening in the CLARA-A2 dataset during the last decade of the CLARA-A2 period (mentioned in https://link.springer.com/chapter/10.1007%2F978-3-030-33566-3_5 and also verified by other studies based on MODIS and CALIPSO observations). More clearly, an increase in the amount of low clouds in Arctic areas previously covered with ice during the polar summer is already seen. We are about to release a third edition of CLARA (to be named CLARA-A3) next spring and that will cover the period 1979-2020 (42 years). I guess we will have even better chances of describing the evolution in the Arctic region in that dataset.

In conclusion, we will implement a latitude/longitude-dependent POD as a first step and then explore the opportunity to introduce a parameterised POD later on. I hope we can find something useful in the end.

Best regards

Karl-Göran

klein21 commented 2 years ago

Dear Karl-Göran,

Thanks for your response which helped educate me, particularly where you think the information affecting POD for CLARA comes from.

Just a few short responses in response -

I don’t recall any study computing the POD for ISCCP cloud detection using Calipso, but I don’t follow this closely so maybe there still is one. I was encouraged when your own published results nicely suggest that 0.3 is a reasonable ball-park estimate for POD ~ 0.5 for passive satellite retrievals.
In general, the passive satellite simulators (ISCCP, MODIS, MISR) have not been used for polar regions with ice/snow surfaces, so consideration of POD has remained fairly simple (e.g. fixed optical depth threshold), which has worked OK I think. If COSP users wanted to compare to polar clouds, they always compared to Calipso observations, which I think you acknowledge as the best we can do currently.
If sea-ice/snow extent changes in a given location, I would think you must ask yourself how much of the CLARA reported cloud change is real versus just the result of a change in POD. I say this because of the first order dependence of POD on surface type seen in your data. That would be an additional complication to consider.

Steve

From: Karl-Göran Karlsson @.> Reply-To: "CFMIP/COSPv2.0" @.> Date: Tuesday, September 14, 2021 at 1:16 AM To: "CFMIP/COSPv2.0" @.> Cc: "Klein, Stephen A." @.>, Comment @.***> Subject: Re: [CFMIP/COSPv2.0] Implementation of CLARA simulator (#59)

Dear Steve,

Thanks for your contribution to this discussion.

I do understand the arguments for trying to get away from the position-based (latitude/longitude) description of POD but I think I need to explain a bit further why we cannot drop completely the connection to the lat/lon context.

Our estimation of PODs is based on the opportunity to do matchups between AVHRR observations and CALIPSO observations where satellite orbits are crossing, i.e., at points with simultaneous nadir observations (SNOs). Thus, these SNOs are made at certain geographical positions (lat/lons). The trick is that when doing these matchups for AVHRR-carrying satellites having approximately the same orbit as CALIPSO (i.e., an afternoon orbit with Equator crossing time near 1:30 pm) it is possible to get those matchups made globally (only excluding the area very close to the poles). So this is what we did for satellites NOAA-18 and NOAA-19 in the time period 2006-2015 (before serious orbital drift affected these satellites separating them from the CALIPSO orbit). All results were then compiled and gridded to form a global description of PODs. This is all described in the following paper: https://amt.copernicus.org/articles/11/633/2018/https://urldefense.us/v3/__https:/amt.copernicus.org/articles/11/633/2018/__;!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1SeTJw7OA$

So, the baseline for our description of PODs is this dataset described in a geographical grid. This is our "truth" that we always must relate to (e.g., when comparing the CLARA-A2 results to any model simulation of cloudiness for the same period).

The idea to interpret the PODs as functions of surface (and/or other) parameters is interesting and certainly the only option if we want to predict how PODs might change in the future (as simulated by climate models). But it has both pros and cons. It means that if we notice a change in simulated cloudiness (with respect to the cloud climatolgoy of the original CLARA-A2 dataset) we must then first understand what comes from changing PODs and what comes from real changes in cloudiness. Thus, there is an increasing complexity in the understanding if we really have a change in cloudiness. But, it also offers a tool to investigate how much of this change is related just to e.g. changes in surface properties.

I think it is a bit optimistic to believe that most of the POD variation is linked to the surface reflectance in visible channels. In my opinion, the strength of the CLARA-A2 dataset is very much linked to the AVHRR information given by the spectral information of the shortwave infrared channel at 3.7 micron (AVHRR Channel 3b) and how this information relates to the information in infrared channels 4 and 5 (split window channels at 11-12 microns). For example, the 3.7 micron channel is crucial for the detection of clouds over snow and ice surfaces. That's why I believe that CLARA-A2 could be very useful for studies of the Arctic and Antarctic regions during the polar summer whereas the information during the polar winters is rather poor (as shown by low PODs in the polar winter). In conclusion, I would rather expect a better correlation of PODs with surface temperature and 3.7 micron surface emissivity information than with surface reflectance in the visible AVHRR channels.

An obstacle to jump over is also the fact that the underlying cloud detection method in CLARA-A2 has already used some critical auxiliary datasets like surface emissivity (from MODIS) and surface temperature (from ERA-Interim). Thus, to relate PODs to the same kind of datasets does not provide a completely independent and objective view of how PODs vary. How to handle this is not trivial.

Finally, I want to return to the fundamental simulator statement " “What would the satellite have retrieved for clouds, if the real world had the clouds (and other boundary conditions) of the model?” This is, of course, well-known to me even if the parenthesis about boundary conditions has not been mentioned so often earlier. I mean, mostly the geometric viewing aspects from the space platform plus the vertical profiles of various atmospheric parameters have dominated simulator formulations, haven't they? Or are there simulators that also address changes of e.g. surface conditions? Please tell me in that case. For me, it seems as the CLARA-simulator is sort of the first simulator also addressing the changing surface conditions for simulators describing datasets from passive imagers. As far as I know, neither the ISCCP nor the MODIS simulator is doing something to the potential change in the cloud detection ability. For example, I would think that also the MODIS simulator should behave in a similar way as the CLARA dataset (although with slightly higher PODs). We actually made a study some years ago when we could see the same kind of variation in PODs for MODIS as for the AVHRR methods over the Arctic region (see https://acp.copernicus.org/preprints/9/16755/2009/acpd-9-16755-2009.pdf https://urldefense.us/v3/__https:/acp.copernicus.org/preprints/9/16755/2009/acpd-9-16755-2009.pdf__;!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1RzE_XPBA$). So, maybe the CLARA PODs can also (in some way) be adapted also for other datasets (mainly MODIS, PATMOS-x or ESA-CLOUD-CCI datasets). However, for ISCCP, I don't know. The situation is quite different for the bi-spectral VIS-IR methods and I haven't seen any similar evaluation of ISCCP results based on intercomparisons with CALIPSO. Someone should do that, I think (or maybe I overlook already existing studies?).

You mentioned that you expect changes in cloudiness due to decreasing ice in the Arctic and that this would be an interesting topic to study in the near future. In fact, we have already seen this happening in the CLARA-A2 dataset during the last decade of the CLARA-A2 period (mentioned in https://link.springer.com/chapter/10.1007%2F978-3-030-33566-3_5 https://urldefense.us/v3/__https:/link.springer.com/chapter/10.1007*2F978-3-030-33566-3_5__;JQ!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1Q6_rXnxQ$ and also verified by other studies based on MODIS and CALIPSO observations). More clearly, an increase in the amount of low clouds in Arctic areas previously covered with ice during the polar summer is already seen. We are about to release a third edition of CLARA (to be named CLARA-A3) next spring and that will cover the period 1979-2020 (42 years). I guess we will have even better chances of describing the evolution in the Arctic region in that dataset.

In conclusion, we will implement a latitude/longitude-dependent POD as a first step and then explore the opportunity to introduce a parameterised POD later on. I hope we can find something useful in the end.

Best regards

Karl-Göran

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https:/github.com/CFMIP/COSPv2.0/issues/59*issuecomment-918921436__;Iw!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1QJdJZy8w$, or unsubscribehttps://urldefense.us/v3/__https:/github.com/notifications/unsubscribe-auth/AHZ364DAUJSMU7CT5WG3M6TUB4AEDANCNFSM4Z7WSHGA__;!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1Q2PIrApg$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.us/v3/__https:/apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1SgSv1F-w$ or Androidhttps://urldefense.us/v3/__https:/play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!G2kpM7uM-TzIFchu!lTI-82ftSIvT_QLgcQhTpTMUmpq--VCiwh19pPR62PbK_hzXl-n0xUTIM1QbUltjqw$.

kgkarl commented 2 years ago

Dear Steve,

Thanks for your last comments which I fully support except for one thing.

It is about what to do in the polar regions:

I don't fully agree on that the use of CALIPSO data is always the best we can do here. Simply since AVHRR (and MODIS) is observing the poles in every orbit while CALIPSO cannot observe conditions north of about 82 degrees. So, the inner Arctic and inner Antarctic regions are not covered by CALIPSO. Thus, I really believe that data from passive imagery (AVHRR, MODIS, etc.) may be useful and provide complementary information, especially when PODs are reasonably high (for example in the polar summer). However, the polar winter is still quite a challenge for all passive imagers.

Karl-Göran

CFMIP / COSPv2.0

Implementation of CLARA simulator #59