GEOS-ESM / GOCART

GOCART Aerosol model including process library and framework interfaces (MAPL, NUOPC, and CCPP)
Apache License 2.0
15 stars 14 forks source link

Missing values and MAPL #252

Open mpagowski opened 1 year ago

mpagowski commented 1 year ago

NOAA's GBBEPx wildfire files (at 0.1deg x 0.1deg resolution) have missing values. How MAPL treats such values in interpolations?

mathomp4 commented 1 year ago

@mpagowski Let me ping @bena-nasa and @atrayano to answer this. I'm sure we handle it "correctly" but I'm not sure of the specifics.

bena-nasa commented 1 year ago

Are these files that are read via ExtData? Please let me know what version of MAPL you are using. The short answer is that even though MAPL can respect it when doing say spatial interpolations, if your application code that ultimately uses the data doesn't respect the missing value, it doesn't matter.

MAPL has an internal "MAPL undefined" constant MAPL_UNDEF, that we use at various places in the MAPL code to protect against operations. It is set to a value of 1.0e15

I don't know what version of MAPL you are using but in newer versions of the generation 2 ExtData, we check the missing value defined in the file, if it does not match the "MAPL undefined" value, when we read the array from the NetCDF file we set the points that have the file missing value to the "MAPL undefined" one. Then any point in our code base that respects this will be aware of it. For example when regridding the file grid of the original file to the application grid we respect this. If a target cell has contributions from any points, that are the "MAPL undefined" value we do not include them in the calculation to compute the target value. If all were "MAPL undefined" then the target cell is "MAPL undefined".

Of course if the code or component (outside of MAPL) that ultimately uses these does not protect against doing operations at points that are "MAPL undefined", we cannot control that, the user can do what they want with arrays in their code

In general our emissions files do not have missing values as how do you handle that in all the components that may use this? It would be a nightmare. Rather if there are no emissions, they are simply 0 since gocart is a huge code base and does not check for any sort of missing value when doing array level operations. Not to mention protecting every array operation for a missing value would probably destroy performance.

So the answer is that depending on the version of ExtData you are using, we may respect the the file defined missing value and set points that are "missing" to our own internal missing value. This is respected at points in the MAPL code base, but when you get to the application code, all bets are off since we have no control over how the code developer chose to use arrays.

So the ultimate answer is if your input files have missing values, even if MAPL respects them, GOCART does.

At that point we have 2 options.

  1. Reprocess those files so rather than having missing values, they are just set to say 0 at those points.
  2. If the above is not an option, but you are using a version of ExtData that respects the file missing value then after the fact in the application code, the arrays would have to be intercepted so that any MAPL_UNDEFs can be set to 0 or protected against
  3. If you are using an older version of MAPL where ExtData did not respect the file supplied missing value ( I would need to check the version you are using) it was simply not making any accommodation for that we would need to come up with a custom solution

To me of all 3, options seems by far the easiest, it would be a very trivial python script to read in and re-write the file replacing anything that has a missing value with 0. Heck, maybe even NCO or some other utility could do this, replace anything with a missing value with 0.

mpagowski commented 1 year ago

Thanks, that answers my question. It is about NOAA's ExtData for wildfires which come at 0.1deg x 0.1deg resolution and contain very numerous missing values in places where both wildfires should not exist or they exist but retrievals are obscured.I can see that the safest way to deal with this is to convert all those to 0s though that may not be strictly correct as it will decrease emissions where the wildfires are burning. Any comments from NOAA participants/others would be welcome.

On Mon, Aug 21, 2023 at 8:07 AM Ben Auer @.***> wrote:

Are these files that are read via ExtData? Please let me know what version of MAPL you are using. MAPL has an internal "MAPL undefined" constant MAPL_UNDEF, that we use at various places in the code to protect against operations. It is set to a value of 1.0e15

I don't know what version of MAPL you are using but in newer versions of the generation 2 ExtData, we check the missing value defined in the file, if it does not match the "MAPL undefined" value, read the array from the NetCDF file we set the points that have the file missing value to the "MAPL undefined" one. Then any point in our code base that respects this will be aware of it. For example when regridding the lat-lon grid of the original file to the application grid we respect this. If the target point has inputs from any points that are the "MAPL undefined" value we do not include them in the application to compute the target value. If all were "MAPL undefined" then the target cell is "MAPL undefined".

Of course if the code or component (outside of MAPL) that ultimately uses these does not protect against doing operations at points that are "MAPL undefined", we cannot control that, the user can do what they want with arrays in their code.

In general our emissions files do not have missing values as how do you handle that in all the components that may use this. Rather if there are no emissions, they are simply 0 since gocart is a huge code base and does not check for any sort of missing value when doing array level operations.

So the answer is that depending on the version of ExtData you are using, we may respect the the file defined missing value and set points that are "missing" to our own internal missing value. This is respected at points in the MAPL code base, but when you get to the application code, all bets are off since we have no control over how the code developer chose to use arrays.

So the ultimate answer is if your input files have missing values, even if MAPL respects them, GOCART does.

At that point we have 2 options.

  1. Reprocess those files so rather than having missing values, they are just set to say 0 at those points.
  2. If the above is not an option, but you are using a version of ExtData that respects the file missing value then after the fact in the application code, the arrays would have to be intercepted so that any MAPL_UNDEFs can be set to 0 or protected against
  3. If you are using an older version of MAPL where ExtData did not respect the file supplied missing value it was simply not making any accommodation for that we would need to come up with a custom solution

To me of all 3, options seems by far the easiest, it would be a very trivial python script to read in and re-write the file replacing anything that has a missing value with 0.

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/252#issuecomment-1686510345, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOU6S5WVNJ4AK4O53IW7FTXWN2TRANCNFSM6AAAAAA3V73Z64 . You are receiving this because you were mentioned.Message ID: @.***>

bena-nasa commented 1 year ago

It doesn't matter what is most "correct", what matters is where is this data used, is it going to ingested and used by an application (I assume GOCART), that does operations of floating point arrays filled from this data. At that point you simply cannot have "missing values" unless EVERY array level operations that may use this cata somehow knows to protect/not use points that are missing. That's not realistic or how GOCART (assuming that is the use case) is implemented.

mpagowski commented 1 year ago

Yes, it is GOCART, and currently failing so that the data needs to be preprocessed (i.e. set to 0s) until a solution to distinguish zero emissions (like ocean) from obscured retrievals is found.

On Mon, Aug 21, 2023 at 9:04 AM Ben Auer @.***> wrote:

It doesn't matter what is most "correct", what matters is where is this data used, is it going to ingested and used by an application (I assume GOCART), that does operations of floating point arrays filled from this data. At that point you simply cannot have "missing values" unless EVERY array level operations somehow knows to protect/not use points that are missing, that's just not realistic.

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/252#issuecomment-1686607140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOU6S6JZPPIXEYOYBOFJ6LXWOBGVANCNFSM6AAAAAA3V73Z64 . You are receiving this because you were mentioned.Message ID: @.***>

bena-nasa commented 1 year ago

Yes, sorry, but by far and away the simplest (and really only) solution would be to take the existing files and make new versions that have the "undef" points replaced with 0 and just use those. Should be easy enough to do that. GOCART simply gets arrays that represent emissions, what would that even mean to have "undefined" emissions, either a cell has something or it doesn't in which case it's 0 i.e. no emissions seems perfectly logical. We are doing floating point math on arrays, so it needs valid array, full arrays. I'm not sure how any code could use files that have missing values unless it had a special accommodation for that at the Fortran or C array level which would be bad for performance and vectorization.

mpagowski commented 1 year ago

Yes, that is true but we don't control production of the files

On Mon, Aug 21, 2023 at 11:55 AM Ben Auer @.***> wrote:

Yes, sorry, but by far and away the simplest solution would be to take the existing files and make new versions that have the "undef" points replaced with 0 and just use those. Should be easy enough to do that.

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/252#issuecomment-1686862170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOU6S37GF4RKJIHMOSWOMLXWOVJJANCNFSM6AAAAAA3V73Z64 . You are receiving this because you were mentioned.Message ID: @.***>

bbakernoaa commented 1 year ago

@mpagowski That is partially true. Yes NESDIS controls the creation of the native files but we can have a preprocessor in the workflow to "fix" the files.

mpagowski commented 1 year ago

Yes, we are currently fixing the files for our NRT runs. But the problem with converting all "missing" to 0s remains

On Mon, Aug 21, 2023 at 12:38 PM Barry Baker @.***> wrote:

@mpagowski https://github.com/mpagowski That is partially true. Yes NESDIS controls the creation of the native files but we can have a preprocessor in the workflow to "fix" the files.

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/252#issuecomment-1686927906, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOU6S5RTD63M7VGVBCEXXLXWO2JTANCNFSM6AAAAAA3V73Z64 . You are receiving this because you were mentioned.Message ID: @.***>

bena-nasa commented 1 year ago

Ok, sounds like you can fix this in your workflow. In that, there exists a spot in the workflow where you can take the file(s) as produced by NESDIS, make new file(s) from the originals that have missing value replace by 0 using a some sort of utility, then those are the files that are fed to GOCART. What is the "problem with converting all "missing" to 0s"? Are you asking how one can do this?

mpagowski commented 1 year ago

No, we are set, we are already converting all missing values to 0s and let NESDIS deal with missing values in places where retrievals are obscured.

On Tue, Aug 22, 2023 at 7:05 AM Ben Auer @.***> wrote:

Ok, sounds like you can fix this in your workflow. In that you can take the file(s) as produced by NESDIS, make new file(s) from the originals that have missing value replace by 0, then those are the files that are fed to GOCART. What is the "problem with converting all "missing" to 0s"? Are you asking how one can do this?

— Reply to this email directly, view it on GitHub https://github.com/GEOS-ESM/GOCART/issues/252#issuecomment-1688254939, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFOU6S7OA3L5B3D4JIPCTEDXWS4ARANCNFSM6AAAAAA3V73Z64 . You are receiving this because you were mentioned.Message ID: @.***>