NOAA-PMEL / LAS

Live Access Server
https://ferret.pmel.noaa.gov/LAS/
The Unlicense
13 stars 5 forks source link

SOCAT: negative/undefine GVCO2 and D2L values #1453

Closed karlmsmith closed 6 years ago

karlmsmith commented 6 years ago

Reported by @karlmsmith on 16 Nov 2012 00:06 UTC Many GlobalView CO2 values in the data were something like -100000 These were the result of ingesting in Ferret's default missing value of -1.0E+34, except the data type description caused a truncation of the value.

Any negative values are now changed to NULL. When ingesting new data, any negative computed GVCO2 values are assigned as NULL.

But looking at the computation of GVCO2, I do not think it make sense that there would be cruise locations where GVCO2 is undefined. But I may be wrong on that. It uses data defined on time and latitude only, expands that data to all longitudes, then uses samplexyt. I modified the script changing the synthesized longitude axis from /modulo=360 /unitslongitude /edges /X=0:360:180 to /units=degrees_east /edges /X=-360:540:60, but it did not seem to affect the results of the latest updated cruises.

Migrated-From: http://dunkel.pmel.noaa.gov/trac/las/ticket/1447

karlmsmith commented 6 years ago

Attachment from @karlmsmith on 19 Nov 2012 21:02 UTC Cruise ID/Expocodes of cruises containing data points with undefined distance-to-land values. nulld2l_cruises.txt

karlmsmith commented 6 years ago

Comment by @karlmsmith on 19 Nov 2012 21:00 UTC Also found that some of the distance-to-land (D2L) values also had this error. Reset those less than -100 to NULL on Nov 19, 2012. The remaining values are non-negative. This change will appear in the next update. The Nov 15, 2012 update checked for these undefined values in D2L as well as GVCO2 and set them to NULL for those updated cruises.

Still begs the question of why we are getting undefined D2L values. Some appear very near (at?) the north pole, so these may be expected and will not matter since there is no Coastal overwrite of Arctic regions. But not all are in the Arctic region. (Maybe near the anti-meridian?)

Will attach a list of cruises that have undefined D2L (currently huge negative, soon to be NULL) values.

karlmsmith commented 6 years ago

Comment by @karlmsmith on 19 Nov 2012 22:46 UTC Actually, those distance-to-land values that were originally negative are all near the north pole; one cruise: 06AQ20110806

Those with negative D2L were all last updated 2012-07-28. Those will null D2L were all last update 2012-08-11. Rerunning Ferret on these cruises may fix this issue. The troublesome question is whether any coastal cruise regions are being incorrectly assigned or missed because these mistakes.

Updated SOCAT_TRAINING database to set NULL and negative D2L and GVCO2 values to -100 so the data points can be visualized. Note: 154,061 data points with invalid D2L (set to -100) and 4,523,273 data points with invalid GVCO2 (set to -100).

karlmsmith commented 6 years ago

Comment by steven.c.hankin on 19 Nov 2012 22:55 UTC I expect that these problems will be easy for Ansley to spot and fix after she returns. In the case of the D2L field, the missing values might very well be in the gridded field, itself, and simply have escaped notice in the past.

The larger issue is that we see here that we synthesized incorrect values, and then let them into the SOCAT database without detection. In V3 should we think about sanity checking the values that we generate, along the lines of the V3 range checking that we plan to invoke for variables already in the files?

karlmsmith commented 6 years ago

Comment by @karlmsmith on 19 Nov 2012 23:01 UTC Sanity checking for negative values was in place for the Nov 15, 2012 update. Could also add an upper bound, or tighter restrictions in the case of gvco2, for v3.

Definitely leaving this for later. (Dealing with latest updates from Benjamin got me side-tracked into this again.)

karlmsmith commented 6 years ago

Comment by steven.c.hankin on 19 Nov 2012 23:07 UTC If sanity checking was already in place as of Nov 15, 2010, then how did we see negative values slipping in on 2012-07-28? Puzzled ... (but not urgent)

karlmsmith commented 6 years ago

Comment by @karlmsmith on 19 Nov 2012 23:09 UTC typo - 2012, not 2010

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 20 Nov 2012 01:40 UTC The GVCO2 computation starts from the definition of a variable that's defined only through 2008. I imagine that's why applying it to more-recent data has lots of missing latitude-time locations

The script GlobalViewMarineCarbon_2008.jnl starts with some comments about updating it. I'll work on doing that.

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 20 Nov 2012 01:46 UTC The gridded distance-to-land dataset /home/xtra/tmap/socat/svn_working_copies/ingest_scripts/dist2land20_burke.nc

has no missing-data or negative data. The values run from 0 to 1000. All the locations where distance-to-land is larger than 1000 meters are set to 1000. Locations within land masses have a dist2land value of zero.

The distance-to-land field is then sampled at the locations of lon and lat from the SOCAT triples file. And, in the resulting file, /home/xtra/tmap/socat/svn_working_copies/ingest_scripts/SOCAT_dist2land_samples.nc

there are no missing data and values are in the range 0-1000. So I don't see that this has to do with the ingest scripts or the underlying data.

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 10 Jan 2013 21:36 UTC The distance-to-land netCDF data has no missing values. The northernmost grid cell is centered at 89.833, with the top of the last grid box at 90.

In the SOCAT data, ask for the distance-to-land variable, with latitude > 89.833. The result is all missing values. It seems that the ingestion scripts are missing the detail of the size of the grid box in the netCDF file. Karl can work around this.

There may be other missing dist-2-land values in the database, but this is part of it.

karlmsmith commented 6 years ago

Comment by steven.c.hankin on 11 Jan 2013 00:19 UTC Since all of the earth north of 89.833 is water, we could set the value of distance to land for all values north of 89.833 to

(smallest value found in the north-most row of the grid) + width in km of 1/2 grid cell

... or something like that ....

karlmsmith commented 6 years ago

Comment by @karlmsmith on 11 Jan 2013 00:31 UTC All the missing d2l values are in 24 cruises. For all cruise except one, all the data have missing d2l values. The last_update_datetime on these cruise are all 2012-08-11, except one that is 2012-07-28 (which is the one with only some of the d2l values are missing - 1675 of the 58490 - and is the one that goes to the north pole). So it appears that the script to compute these values may have had a problem (or did not get run), and had gone unnoticed. Hopefully all that needs to be done is to rerun the ferret scripts that computes the d2l and gvco2 values (with the updated gvco2 dataset), set those d2l values north of 89.833 to about 710, and ingest the results to fix these issues.

Note that I have previously verified that all data in the *.mat.txt files given to use by Benjamin - the data-point data - matches what is in the database in the 'data' table. So the only issue is the values we add to these data points.

karlmsmith commented 6 years ago

Comment by @karlmsmith on 17 Jan 2013 19:39 UTC Created a Java app to update the gvco2 and d2l values, as well as the region_ID since the updated d2l make change this value. Does this by creating the triples.txt files for a cruise, running Ferret using this input, and retrieving the data out of the results file from Ferret. The d2l of missing values with latitudes greater than 89.75 were hard-coded to 725.0 m. The GlobalView fCO2 data was updated by Ansley to the 2011 version.

Running this on the SOCAT_TRAINING database (recreated from the latest SOCAT2 database files), for all data points of all cruises, resolved all the missing d2l values. The only missing gvco2 values are for exactly those cruise that took place before 1979.

Region_ID changes that would occur: (forgot to update the region_ID's for the data points in the database in this test run) region_ID changed from A to C (59266 data points) region_ID changed from A to Z (171 data points) region_ID changed from C to A (1 data point) region_ID changed from N to T (47 data points) region_ID changed from Z to C (10966 data points) where: A = North Atlantic Z = Tropical Atlantic C = Coastal N = North Pacific T = Tropical Pacific

Presumable the handful of changes that were not "to C" are borderline points that would change due to precision.

This also changed 14,621,995 gvco2 values, of which only 4,498,361 were previously missing values. Random inspection showed that the change in previously given gvco2 values were quite small (typically 0.1%, sometimes as much as 1%). This is probably from the updated GlobalView data, but maybe from using double-precision Ferret.

All d2l changes (144,740 value) were from missing values.

karlmsmith commented 6 years ago

Comment by @karlmsmith on 18 Jan 2013 20:52 UTC For the record, cruises that had null d2l. The first cruise - 06AQ20110806 - had only some null d2l - near the north pole. The rest had all null d2l values and so were missing coastal regions.

+--------------+
| cruise_ID    |
+--------------+
| 06AQ20110806 |
| 06M220060524 |
| 29HE20051019 |
| 29HE20060319 |
| 29HE20060925 |
| 29HE20070321 |
| 29HE20071008 |
| 29HE20080413 |
| 29HE20081008 |
| 35MF20080105 |
| 35TH20080612 |
| 35TH20080824 |
| 58GS20090118 |
| 58GS20090205 |
| 58GS20090312 |
| 58GS20090402 |
| 58GS20090414 |
| 58GS20090528 |
| 58GS20090618 |
| 58GS20110429 |
| 58GS20110516 |
| 58GS20110610 |
| 58GS20110624 |
| 58GS20110721 |
| 58GS20111003 |
+--------------+

The "A to Z" and "N to T" region_ID changes were indeed all points with latitude = 30.0, and the "C to A" had a d2l of 400.0, so these changes are just a result of floating-point precision and of no real consequence.

Plots of distance-to-land are looking reasonable.

Proceeding with updating SOCAT2.

karlmsmith commented 6 years ago

Comment by @karlmsmith on 31 Oct 2013 17:12 UTC Closing out old tickets