NOAA-GSL / VxIngest

Other
2 stars 0 forks source link

find out why max CTC is in December #297

Closed randytpierce closed 7 months ago

randytpierce commented 7 months ago

This query implies that the maximum CTC is inDecember. This needs to be understood.

cbq> select min(fcstValidISO) as min, max(fcstValidISO) as max FROM vxdata._default.METAR WHERE type="DD" AND docType="CTC" AND subset="METAR" AND version="V01"; { "requestID": "df835d5a-0aa8-4b84-a932-894b757cc098", "signature": { "max": "json", "min": "json" }, "results": [ { "max": "2023-12-19T16:00:00", "min": "2022-01-01T00:00:00" } ], "status": "success", "metrics": { "elapsedTime": "2m20.66144335s", "executionTime": "2m20.661375854s", "resultCount": 1, "resultSize": 82, "serviceLoad": 1 } }

randytpierce commented 7 months ago

This still appears to be a problem after the errors with models and obs have been fixed...

cbq> select max (meta().id) FROM vxdata._default.METAR WHERE meta().id like "DD:V01:METAR:HRRR_OPS:ALL_HRRR:CTC:CEILING:%";
{
    "requestID": "84b8abc5-1060-41ce-aacb-a8a316a79e60",
    "signature": {
        "$1": "json"
    },
    "results": [
    {
        "$1": "DD:V01:METAR:HRRR_OPS:ALL_HRRR:CTC:CEILING:1703001600:9"
    }
    ],
    "status": "success",
    "metrics": {
        "elapsedTime": "17.832637818s",
        "executionTime": "17.832565894s",
        "resultCount": 1,
        "resultSize": 79,
        "serviceLoad": 1
    }
}
cbq> 

This is a fcstValidISO of "fcstValidISO": "2023-12-19T16:00:00",

randytpierce commented 7 months ago

Looking at a recent ingest archive " " I see this document being created

  DD:V01:METAR:HRRR_OPS:ALL_HRRR:CTC:CEILING:1703001600:5

and this document is dated "fcstValidISO": "2023-12-19T16:00:00", which means that there is a problem wit the ingest where it is always getting the same date.

randytpierce commented 7 months ago

This looks suspicious....

2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-30> (vxingest.ctc_to_cb.ctc_builder): Looking up model document: DD:V01:METAR:HRRR_OPS:1703008800:1
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-30> (vxingest.ctc_to_cb.ctc_builder): Looking up observation document: DD:V01:METAR:obs:1703008800
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-29> (vxingest.ctc_to_cb.ctc_builder): Looking up model document: DD:V01:METAR:HRRR_OPS:1703008800:1
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-28> (vxingest.ctc_to_cb.ctc_builder): Looking up model document: DD:V01:METAR:HRRR_OPS:1703008800:1
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-29> (vxingest.ctc_to_cb.ctc_builder): Looking up observation document: DD:V01:METAR:obs:1703008800
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-25> (vxingest.ctc_to_cb.ctc_builder): Looking up model document: DD:V01:METAR:HRRR_OPS:1703008800:1
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-25> (vxingest.ctc_to_cb.ctc_builder): Looking up observation document: DD:V01:METAR:obs:1703008800
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-28> (vxingest.ctc_to_cb.ctc_builder): Looking up observation document: DD:V01:METAR:obs:1703008800
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-27> (vxingest.ctc_to_cb.ctc_builder): Looking up model document: DD:V01:METAR:HRRR_OPS:1703008800:1
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-27> (vxingest.ctc_to_cb.ctc_builder): Looking up observation document: DD:V01:METAR:obs:1703008800
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-26> (vxingest.ctc_to_cb.ctc_builder): Looking up model document: DD:V01:METAR:RAP_OPS_130:1693742400:51
2024-02-04T15:21:57+0000 [INFO] <VxIngestManager-26> (vxingest.ctc_to_cb.ctc_builder): Looking up observation document: DD:V01:METAR:obs:1693742400
2024-02-04T15:21:58+0000 [INFO] <VxIngestManager-27> (vxingest.ctc_to_cb.ctc_builder): CTCBuilder.handle_document - adding document DD:V01:METAR:HRRR_OPS:GtLk:CTC:VISIBILITY:1703008800:1

I think it might be always processing the same model/obs pairs.

randytpierce commented 7 months ago

Now that I have the scraper working again I am seeing an alert...

alertname = record_count_difference_not_0
ingest_id = ingest_record_count_difference
instance = [adb-cb1.gsd.esrl.noaa.gov:9095](http://adb-cb1.gsd.esrl.noaa.gov:9095/)
job = node_exporter
log_file = /data-ingest/data/temp_tar/96233/job_v01_metar_ctc_ceiling_model_ops-2024-02-04T16:10:06.log
severity = critical
Annotations
description = Ingest job record count difference is not 0, intended and actual record updates do not match VALUE = -365 LABELS = map[__name__:job_v01_metar_ctc_ceiling_model_ops_55abfbc3fec5 ingest_id:ingest_record_count_difference instance:[adb-cb1.gsd.esrl.noaa.gov:9095](http://adb-cb1.gsd.esrl.noaa.gov:9095/) job:node_exporter log_file:/data-ingest/data/temp_tar/96233/job_v01_metar_ctc_ceiling_model_ops-2024-02-04T16:10:06.log]
summary = ingest job record count difference is not 0

which is probably this issue. Good to know we would have been getting alerts all along, if the scraper had been working.

randytpierce commented 7 months ago

Using this query from the ctc_builder

SELECT fve.fcstValidEpoch, fve.fcstLen, meta().id
                            FROM `vxdata`._default.METAR fve
                            WHERE fve.type='DD'
                                AND fve.docType='model'
                                AND fve.model='HRRR_OPS'
                                AND fve.version='V01'
                                AND fve.subset='METAR'
                                AND fve.fcstValidEpoch >= 1703001600
                                AND fve.fcstValidEpoch <= 9223372036854775807
                            ORDER BY fve.fcstValidEpoch, fve.fcstLen

it shows that the max model fcstValidEpoch for HRRR_OPS is 1703160000 which is Thursday, December 21, 2023 12:00:00 PM. This indicates a problem in the model ingest.

What I am finding out is that when an archive is opened for importing it seems that the import starts and does not finish properly, but doesn't give an error. For example looking at the archive /data-ingest/data/xfer/archive/success-job_v01_metar_ctc_ceiling_model_ops_fb6bff2a9327_1707315008.tar.gz and the data file inside it 20240207141008/MD:V01:METAR:HRRR_OPS:E_US:CTC:CEILING:ingest.json I can see that the document "DD:V01:METAR:HRRR_OPS:E_HRRR:CTC:CEILING:1707148800:5" which is added on line 178 does make into the database but document "DD:V01:METAR:HRRR_OPS:GtLk:CTC:CEILING:1707152400:0" which is added on line 188 does not. Did the cbimport fail? Not according to the exit code.

randytpierce commented 7 months ago

If I copy that archive into the xfer directory BY ITSELF and run the same import command that is in the crontab I find no errors and all of the documents from the archive end up in the database. So something about doing multiple archives is munging our import.

randytpierce commented 7 months ago

looking at this I think the problem must be in the way the import processing is firing off concurrent cbimports. I'm going to try 1) downloading the new couchbase tools cli and 2) using cbimport with the -t (threads) option and 3) not running cbimports concurrently. This should simplify the code.