cf-convention / discuss

A forum for any discussion about interpretation, clarification, and proposals for changes or extensions to the CF conventions.
43 stars 6 forks source link

question about standard names - atmosphere thickness #361

Closed jwrotny74 closed 1 month ago

jwrotny74 commented 3 years ago

Hello CF group,

I have a general question about standard names for the CF board, and was hoping that someone could comment on my question. I'm adding this question to the general discussion section because I'm am not proposing a new standard name at this time.

Our team produces satellite data products and one of these products is the retrieved atmospheric optical depth. However, depending on the path that the retrieval takes, the retrieved optical depth will be either a cloud optical depth or an aerosol optical depth. Essentially, the path of the retrieval is determined by the particular observation. So, the optical depth output to our netCDF files will be a combination of cloud optical depths and aerosol optical depths over the grid variable output to the file, meaning that different physical quantities are output over the 2-d variable, though both are fundamentally optical thicknesses.

I see that there are standard names for both quantities that we output (atmosphere_optical_thickness_due_to_aerosol and atmosphere_optical_thickness_due_to_cloud). My question is, given our particular output, is it appropriate to list both standard names in our netCDF files, or is this not accepted by the CF conventions? Or, conversely, does it make sense to propose a new standard name that would apply to our product?

Thank you.

Jonathan Wrotny

DocOtak commented 3 years ago

Hi @jwrotny74

Based on your description and the current standard names on the list I think you do indeed need a new standard name to capture this variable.

With the caveat that this is outside my normal field, the following might work: atmosphere_optical_thickness_due_to_cloud_and_aerosol_particles

There is also precedent for a plain name with no "due_to", where in the definition it is assumed to be due to "all processes" atmosphere_optical_thickness

I'd hesitate with that one given other processes which might change the optical depth e.g. absorption/scattering due to the atmospheric gasses themselves.

jwrotny74 commented 3 years ago

Hello @DocOtak,

Thank you for your comments. You make some good suggestions. My one concern with your first proposed name is that it seems to imply that the optical thickness is due to both clouds and aerosols. For our particular data product (which will reported on a 2-d grid array over the Earth), some of the values in the grid will be optical thickness due to aerosols while others will be optical thickness due to clouds. There are no values that will be thickness due to both aerosols and clouds. Is this type of nuance in the product (effectively reporting two different quantities) consistent with the standard name you are proposing? I just wasn't sure if there is a precedent for how to create a standard name for products that report two different quantities.

-Jonathan

davidhassell commented 3 years ago

Hello @jwrotny74,

Is the expectation that the user will know, or be able to work out which locations store thickness due to both aerosols, and which store thickness due to clouds? Or does that not matter?

Thanks, David

DocOtak commented 3 years ago

Hm, might need to wait for some of the other CF gurus to chime in on this one...

Ah, looks like @davidhassell is on it already.

jwrotny74 commented 3 years ago

Hi @davidhassell,

I suspect some data users would be interested in the optical thickness data and knowing if it is due to clouds or aerosols. However, they won't be able to know if it is due to clouds or aerosols unless they obtain another file of cloud data which they could then use to learn more about the optical thickness. The cloud data is contained in a separate file from the one that contains the optical thickness data, and the user would have to combine data from both files to determine the source of the optical thickness.

-Jonathan

taylor13 commented 3 years ago

I think it would help to clarify a bit more here the observation you are reporting. When you say that the "retrieved optical depth will be either a cloud optical depth or an aerosol optical depth", do you mean that if the retrieval is along a path that is free of clouds, then the measurement reflects the aerosol loading, but when cloud is present, the cloud dominates (over aerosols), and the measurement reflects primarily the optical depth of the cloud?

If that is the case, then I think the standard name "atmosphere_optical_thickness" would be appropriate, although there might be some explanatory note needed to indicate that the retrieval is determined by aerosols and clouds not the gaseous constituents of the atmosphere (which also generally affect optical thickness).

I'm not a real expert here, so others should chime in.

jwrotny74 commented 3 years ago

Hello @taylor13,

Thanks for your message.

Regarding our optical depth observation, it is a bit different than what you describe. When the retrieval path is free of clouds, the outputted optical depth is due only to aerosols. When the retrieval path contains clouds, the optical depth is due only to clouds (so, no aerosol attenuation). So, hence, there is a technically two different quantities that can be output, which creates a bifurcation in the product. What I am not sure of is if this situation has been or can be handled in a way through a standard name. I'm fine with the generic standard name "atmosphere_optical_thickness" if there is a way to handle this bifurcation in the product.

larsbarring commented 3 years ago

While I am not at all an expert on this subject, I get the impression from your description that including an ancillary variable giving the cloud mask for every retrieval time would handle the bifurcation. The standard name cloud_binary_mask already exists, which in combination with the generic standard name atmosphere_optical_thickness would provide the necessary information.

jwrotny74 commented 3 years ago

Hello @larsbarring,

Thanks for your comments. This is a great suggestion, and I think could work. However, a possible issue is that the cloud mask data variable is not included in the same file as the optical depth variable. The cloud mask data is in a separate file. Is there anyway to handle this with using the ancillary_variables attribute?

zklaus commented 3 years ago

@jwrotny74, sounds to me like you are looking for external_variables, though perhaps others can advise on the correct usage.

davidhassell commented 3 years ago

Hello,

I think Lars' suggestion using ancillary variables sounds good; and Klaus's suggestion of using external_variables seems logical to me.

However, the use external_variables is currently limited to cell measure variables only. In the orginal TRAC discussion that introduced external_variables the princple employed was to only allow external variable to contain "supplemental" rather that "essential" data. At the time the use case was only for cell measures, so only these were allowed to be external.

Without having yet thought too deeply about all of the different uses of ancillary data, I would say that data in ancillary variables would also "supplemental", and so to allow them to be referenced by external variables seems reasonable.

Thanks, David

zklaus commented 3 years ago

Thanks, @davidhassell, I was wondering about that. The CF conventions state at the moment:

2.6.3. External Variables

The global external_variables attribute is a blank-separated list of the names of variables which are named by attributes in the file but which are not present in the file. These variables are to be found in other files (called "external files") but CF does not provide conventions for identifying the files concerned. The only attribute for which CF standardises the use of external variables is cell_measures.

Which to me reads as if you could use external_attributes for whatever you want, but that it is standardized for cell_measures only. If this is not correct, we might want to reword that last sentence a little bit.

larsbarring commented 3 years ago

There was a discussion on this back in 2017, see this email thread. In there external variables in comments and other non-controlled attributes was mentioned, and the last comment was a suggestion to be restrictive regarding when/where to allow external variables. I do not think there was a clear conclusion in the thread.

JonathanGregory commented 3 years ago

Dear Jonathan @jwrotny74

When the retrieval path is free of clouds, the outputted optical depth is due only to aerosols. When the retrieval path contains clouds, the optical depth is due only to clouds (so, no aerosol attenuation). So, hence, there is a technically two different quantities that can be output

I'm sorry if it throws a spanner in the works, but if I understand this correctly I am not sure it's right to give it a standard name. In the second case, do you mean that the optical depth takes only the clouds into account, and ignores the aerosol attenuation (rather than that the aerosol attenuation is a relatively small part of it)? If so, this variable would seem to be a mixture of two quantities that ought to have different standard names.

Best wishes

Jonathan

taylor13 commented 3 years ago

I had the same thought (see https://github.com/cf-convention/discuss/issues/361). If the the variable reported is truly "binary", It would seem like the measuring instrument would have to determine whether or not clouds were present and if they, were to somehow filter any effects of aerosols. That is a very complicated process. Is there some publication or documentation that describes what the instrument measures?

jwrotny74 commented 3 years ago

Hello @JonathanGregory and @taylor13,

I'll reply to your comments first since I think understanding the product is the first step...

First, the optical depth product that we are reporting is truly a 'binary' product, meaning that it is the aerosol optical depth for clear-sky conditions and the cloud optical depth for cloudy conditions. The optical depth retrieval in our algorithm knows when clouds are present based on a cloud mask input which is uses to set the 'clear-sky' or 'cloudy' path. For the 'clear-sky' retrieval path, the cloud optical depth is set to 0 and the retrieved quantity is the aerosol optical depth. For the 'cloudy' retrieval path, the aerosol optical depth is set to a constant, small, value and the retrieved quantity is the cloud optical depth. So, both aerosol and cloud attenuation is accounted for in our retrieval, but only one optical depth is retrieved depending on the path.

So, our reported quantity will be two different quantities, and it seems like one standard name might not be appropriate, as Jonathan Gregory has suggested.

taylor13 commented 3 years ago

thanks @jwrotny74, your description is clear and understandable. I now agree with you and Jonathan that it will be impossible to assign a single standard name (and CF only currently permits one name per variable). On the other hand, I suppose we could consider something rather ambiguous such as atmosphere_optical_thickness_due_to_aerosol_or_due_to_cloud, but it doesn't really sit right with me (but I'm not sure why).

jwrotny74 commented 3 years ago

Thanks @taylor13. Yes, it was my assumption that only one standard name could be assigned to a particular data variable. I am open to other options such as what you have suggested, but I agree it seems a bit strange. I'll leave it to some others to comment their ideas or perspectives.

JonathanGregory commented 3 years ago

Dear Jonathan @jwrotny74

Thanks for your explanation of the quantity. So actually you're not ignoring the aerosol when there is cloud, but assigning a nominal value to it. In that case maybe it would be reasonable to regard this quantity as an estimate of the optical thickness, taking everything into account, that could have a standard name of plain atmosphere_optical_thickness. I don't see that one in the list at present, but it would make sense as a quantity, wouldn't it?

You are evaluating the quantity in two different ways, depending on the conditions. Perhaps a good CF way to describe this would be with an ancillary variable with flag_values and flag_meanings (CF section 3.5). The existing cloud mask variable, if it is a binary quantity, could serve as a flag variable just by adding those attributes to it. That way no special interpretation would be needed, because flag variables are self-explanatory.

Then that brings us back to the problem of ancillary variables in external files. I agree with others that this ought to be possible, although not allowed at present.

Best wishes

Jonathan

jwrotny74 commented 3 years ago

Hello @JonathanGregory,

Thanks for your comments. So, yes, our retrieval does not ignore aerosols when there are clouds. A constant value is assumed (or possibly more than one for different surfaces), then the retrieval obtains the cloud optical depth and this cloud optical depth is what we report in our files. You bring up a good point in that we could regard the reported cloud optical depth as a total optical depth if we consider the constant, aerosol optical depth as part of the reported quantity. But, can we regard a reported quantity in a file as something else, in general, even if we know how to obtain the more general quantity from the reported quantity and some other information (in our case, a constant aerosol optical depth)? If we could do that, then using a standard name of atmosphere_optical_thickness does seem like it would fit.

About the external cloud mask variable, I do believe that the cloud mask file already uses the flag_meanings and flag_values attributes for the variable, but I would need to double-check.

-Jonathan

JonathanGregory commented 3 years ago

Dear Jonathan @jwrotny74

In general we shouldn't label something as something else, indeed, but I think you could argue that this quantity was an estimate of atmosphere_optical_thickness. It's not the more accurate or best way to estimate it, but it's a possibility. Is that the intention of constructing it like this? Would you think it reasonable for this field to be compared with an estimate of atmosphere_optical_thickness from a different source or method?

Best wishes

Jonathan

jwrotny74 commented 3 years ago

Hello Jonathan @JonathanGregory

Sorry for the slow reply. Our team was discussing this topic before I replied.

I think that our team is going to argue that the product that we are reporting is NOT a good estimate of the total atmospheric optical thickness. There are simply too many quantities missing from our reported product to realistically call what we are reporting a total optical thickness (including Rayleigh scattering, contributions from gases, etc.). I understand the spirit of what you are suggesting in terms that the estimate does not have to be perfect, but in our case, we think our product doesn't take into account enough of the contributions to optical depth to make it representative. To answer your other question, no, we wouldn't compare our product with other estimates of total atmosphere optical depth.

I think we will have to forego a standard name for our product unless there is some other way to handle it.

JonathanGregory commented 3 years ago

Dear Jonathan @jwrotny74

Thanks for your careful thought. In that case, a possibility would be to provide the data in two complementary data variables with appropriate and different standard names. Would that be reasonable?

Best wishes

Jonathan

jwrotny74 commented 3 years ago

Hello Jonathan @JonathanGregory

Thanks again for this suggestion. It is a good one, but I think right now we will probably not go this route for the time being. But, I will keep it in mind because things might change as the product eventually becomes operational.

Sincerely,

Jonathan

JonathanGregory commented 3 years ago

OK, fair enough. Thanks for the discussion, Jonathan @jwrotny74.

jwrotny74 commented 3 years ago

Thanks to you also, Jonathan @JonathanGregory, and for others who chimed in.