aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

GHRRST - old layers of L3S 1M products still used #476

Closed lbesnard closed 2 years ago

lbesnard commented 3 years ago

from @ocehugo 's email:

See msg below from a user accessing the IMOS SST stream. Looks like we got some misconfigured layer (or harvester) for the L3S SST products. May you have a look!?

I just did a quick check, and for example, L3S day and night monthly IMOS SST got misconfigured information/behaviour:

a. In the portal (step 1), the range of the data is 1992-2021. b. At step 2, the selectable range is 1992-2018 (consistent with the dbprod content at srs_sst.srs_sst_l3s_1m_dn_gridded_url ). c. Finally, we got 2019 files in thredds (here).

Looks like a combination of errors. I would say the harvester/pipeline is not updating the database/layers and/or we got some typo in the layer somewhere.

Cheers,

The issue:


For the record: SRS GHRSST data went through a major reprocess 2+ years ago. All products were pushed back to the generic timestep harvester using a new schema and with new geoserver layers.

However four layers didn't follow the same path; @ocehugo found an issue with one of them. These four layers, used in production are created by the non generic SRS_SST_GRIDDED harvester (now removed from the Github harvester repo).

We also have equivalent layers as well created by the GENERIC_TIMESTEP harvester. The respective names are:

old layer new layer (generic timestep)
srs_sst_l3s_1m_day_gridded_url srs_ghrsst_l3s_1m_day
srs_sst_l3s_1m_dn_gridded_url srs_ghrsst_l3s_1m_dn
srs_sst_l3s_1m_ngt_gridded_url srs_ghrsst_l3s_1m_ngt
srs_sst_l3s_1m_southern_dn_gridded_url srs_ghrsst_l3s_1mS_dn

The deprecated SRS SST harvester has been removed a long time ago from 10-aws. See the first and latest available data in this layer for example:

select min(time) from srs_sst.srs_sst_l3s_1m_dn_gridded_url limit 1;                                          
+---------------------+
| min                 |
|---------------------|
| 1992-03-16 09:20:00 |
+---------------------+

select max(time) from srs_sst.srs_sst_l3s_1m_dn_gridded_url limit 1;                                          
+---------------------+
| max                 |
|---------------------|
| 2018-03-16 09:20:00 |
+---------------------+

On the other hand, the generic timestep harvester is more up to date matching the latest data available on THREDDS, but only for the years 2015 and above:

select min("TIME") from generic_timestep.timestep_url where collection_name = 'srs_ghrsst_l3s_1mS_dn' limit 1 
+---------------------+
| min                 |
|---------------------|
| 2015-01-16 11:10:00 |
+---------------------+

select max("TIME") from generic_timestep.timestep_url where collection_name = 'srs_ghrsst_l3s_1mS_dn' limit 1 
+---------------------+
| max                 |
|---------------------|
| 2019-09-15 23:10:00 |
+---------------------+

I tried pushing the files not in the new generic schema from S3 back into the SRS SST INCOMING_DIR un-successfully because the old files don't pass the CF/GHRSST checker. So I guess this is why we never finish the transition of these layers.

What to do next?


@atkinsn FYI

ggalibert commented 3 years ago

@lbesnard why are the old files failing the CF/GHRSST checks? Can we fix the files ourselves?

lbesnard commented 3 years ago

It's better to let Edward do it since:

He is the source of the data, so when he's back from leave I'll get him to tackle this task

lbesnard commented 2 years ago

all NetCDF files were pushed back to the incoming folder fixing this issue