COPRS / rs-issues

This repository contains all the issues of the COPRS project (Scrum tickets, ivv bugs, epics ...)
2 stars 2 forks source link

[BUG] [OPS] S3 PUG NTC Execution failed on too short product #1050

Open suberti-ads opened 1 year ago

suberti-ads commented 1 year ago

Environment:

Traceability:

Current Behavior: Some execution fall in error with following message:

[code 290] [exitCode 255] [msg Task /usr/local/components/PUG-3.48/bin/PUGCoreProcessor failed]

It appears product sensing duration was very short for these products.

Expected Behavior:

Steps To Reproduce: Start 3% production or 24h00 test

Test execution artefacts (i.e. logs, screenshots…) Execution logs: PUG-NTC-joborder-120263.txt JobOrder: Job120263.txt Preparation log: s3-pug-ntc-part1-preparation-worker-v2-84f98487d-9hph5.log

Whenever possible, first analysis of the root cause sample for job 120263 Product which trigger production ==> S3B_SL_1_RBT__20230427T234045_20230427T235320_20230630T215052_0754_079_001____LN3_D_NT_002.SEN3

First error seen in logs:

2023-07-26T12:02:32+00:00   2023-07-26T12:02:32.005784 s3-pug-ntc-part1-execution-worker-v2-698877f8f7-vscsv PUG_SL_1_RBT 03.48 [0000000132]: [I] PUGCoreProcessor: Exiting with EXIT CODE: 255
2023-07-26T12:02:32+00:00       FATAL: All the product data unit generations exited in error!
2023-07-26T12:02:32+00:00   2023-07-26T12:02:32.005710 s3-pug-ntc-part1-execution-worker-v2-698877f8f7-vscsv PUG_SL_1_RBT 03.48 [0000000132]: [E] PUGCoreProcessor: [PUGCoreProcessor.C: execute:(359)] Unable to generate the required PDUs! --- acs::exCriticalException in PDUGenerator.C(270) from void acs::PDUGenerator::createPDUs() thread "" [140201793349824]
2023-07-26T12:02:32+00:00           Duration from JO: 0.002618 [s]  Minimum duration: 2 lines, i.e. 0.599972 [s]
2023-07-26T12:02:32+00:00       COULD NOT TRIGGER THE GENERATION OF THIS PDU [2023-04-27T23:40:45.143984, 2023-04-27T23:40:45.146602]:
2023-07-26T12:02:32+00:00       acs::StripeGeneratorThread::exStripeGeneratorThreadException in StripeGeneratorThread.C(410) from void acs::StripeGeneratorThread::createPDU() thread "unnamedThread" [140201343260416]
2023-07-26T12:02:32+00:00       caused by:
2023-07-26T12:02:32+00:00       Problem found during product data unit generation -> skipping to the next, if any ...
2023-07-26T12:02:32+00:00   acs::PDUGenerator::exPDUGeneratorException in (0) from  thread "" [140201793349824]

So it try to generate a very short product:

2023-07-26T12:02:32+00:00       COULD NOT TRIGGER THE GENERATION OF THIS PDU [2023-04-27T23:40:45.143984, 2023-04-27T23:40:45.146602]:

This match with value found in preparation job:

    "taskTableName" : "TaskTable.PUG_SL_1_RBT.03.xml",
    "startTime" : "2023-04-27T23:40:45.143984Z",
    "stopTime" : "2023-04-27T23:40:45.146602Z",

So it seems to be an configuration issue hereafter our configuration preparation:

app.preparation-worker.pdu.config.SL_1_RBT___.type=FRAME
app.preparation-worker.pdu.config.SL_1_RBT___.length-in-s=180
app.preparation-worker.pdu.config.SL_1_RBT___.gap-threshhold-in-s=3
app.preparation-worker.pdu.config.SL_1_RBT___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SL_1_RBT___.dyn-proc-params.hardwareName=O
[...]
app.housekeep.pdu.config.SL_1_RBT___.type=FRAME
app.housekeep.pdu.config.SL_1_RBT___.length-in-s=180
app.housekeep.pdu.config.SL_1_RBT___.gap-threshhold-in-s=0.2
app.housekeep.pdu.config.SL_1_RBT___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SL_1_RBT___.dyn-proc-params.hardwareName=O

preparation log received product seen:

2023-07-25T03:28:23+00:00   {"header":{"type":"LOG","timestamp":"2023-07-25T03:28:23.511838Z","level":"INFO","line":260,"file":"MetadataClient.java","thread":"KafkaConsumerDestination{consumerDestinationName='s3-pug-ntc-part1.message-filter', partitions=30, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"First Product of Orbit query for product type 'SL_1_RBT___' and orbit '26068' returned S3Metadata [absoluteStartOrbit=26068, anxTime=2023-04-27T21:59:45.974883Z, anx1Time=2023-04-27T23:40:45.146602Z, creationTime=2023-06-30T21:50:52.000000Z, granuleNumber=17, granulePosition=NONE, insertionTime=2023-06-30T22:28:15.970452Z, productName=S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3, productType=SL_1_RBT___, keyObjectStorage=S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3, validityStart=2023-04-27T23:40:45.143984Z, validityStop=2023-04-27T23:53:19.714455Z, missionId=null, satelliteId=B, stationCode=null]"},"custom":{"logger_string":"esa.s1pdgs.cpoc.metadata.client.MetadataClient"}}
2023-07-25T03:28:23+00:00   {"header":{"type":"LOG","timestamp":"2023-07-25T03:28:23.506518Z","level":"INFO","line":225,"file":"MetadataClient.java","thread":"KafkaConsumerDestination{consumerDestinationName='s3-pug-ntc-part1.message-filter', partitions=30, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"S3Metadata query for family 'S3_L1_NTC' and product name 'S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3' returned S3Metadata [absoluteStartOrbit=26068, anxTime=2023-04-27T21:59:45.974883Z, anx1Time=2023-04-27T23:40:45.146602Z, creationTime=2023-06-30T21:50:52.000000Z, granuleNumber=17, granulePosition=NONE, insertionTime=2023-06-30T22:28:15.970452Z, productName=S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3, productType=SL_1_RBT___, keyObjectStorage=S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3, validityStart=2023-04-27T23:40:45.143984Z, validityStop=2023-04-27T23:53:19.714455Z, missionId=null, satelliteId=B, stationCode=null]"},"custom":{"logger_string":"esa.s1pdgs.cpoc.metadata.client.MetadataClient"}}
2023-07-25T03:28:23+00:00   {"header":{"type":"LOG","timestamp":"2023-07-25T03:28:23.501050Z","level":"INFO","line":134,"file":"TaskTableMapperService.java","thread":"KafkaConsumerDestination{consumerDestinationName='s3-pug-ntc-part1.message-filter', partitions=30, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Created IpfPreparationJobs for product S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"custom":{"logger_string":"esa.s1pdgs.cpoc.preparation.worker.service.TaskTableMapperService"}}
2023-07-25T03:28:23+00:00   {"header":{"type":"REPORT","timestamp":"2023-07-25T03:28:23.500000Z","level":"INFO","mission":"S3","workflow":"NOMINAL","rs_chain_name":"S3-PUG-NTC","rs_chain_version":"1.14.0-rc1"},"message":{"content":"End associating TaskTables to CatalogEvent S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"task":{"uid":"797eea65-0859-4d82-8914-e29f93976429","name":"TaskTableLookup","event":"END","status":"OK","output":{},"input":{"filename_string":"S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"quality":{},"error_code":0,"duration_in_seconds":0.0,"missing_output":[]}}
2023-07-25T03:28:23+00:00   {"header":{"type":"REPORT","timestamp":"2023-07-25T03:28:23.500000Z","level":"INFO","mission":"S3","workflow":"NOMINAL","rs_chain_name":"S3-PUG-NTC","rs_chain_version":"1.14.0-rc1"},"message":{"content":"Start associating TaskTables to CatalogEvent S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"task":{"uid":"797eea65-0859-4d82-8914-e29f93976429","name":"TaskTableLookup","event":"BEGIN","input":{"filename_string":"S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"child_of_task":"caa156d4-769f-44d1-b65c-0460d68d3635"}}
2023-07-25T03:28:23+00:00   {"header":{"type":"LOG","timestamp":"2023-07-25T03:28:23.500415Z","level":"INFO","line":76,"file":"RoutingBasedTasktableMapper.java","thread":"KafkaConsumerDestination{consumerDestinationName='s3-pug-ntc-part1.message-filter', partitions=30, dlqName='error-warning'}.container-0-C-1"},"message":{"content":"Got tasktable [TaskTable.PUG_SL_1_RBT.03.xml] for SL_1_RBT____B"},"custom":{"logger_string":"esa.s1pdgs.cpoc.preparation.worker.tasktable.mapper.RoutingBasedTasktableMapper"}}
2023-07-25T03:28:23+00:00   {"header":{"type":"REPORT","timestamp":"2023-07-25T03:28:23.499000Z","level":"INFO","mission":"S3","workflow":"NOMINAL","rs_chain_name":"S3-PUG-NTC","rs_chain_version":"1.14.0-rc1"},"message":{"content":"Received CatalogEvent for S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"task":{"uid":"caa156d4-769f-44d1-b65c-0460d68d3635","name":"ProductionTrigger","event":"BEGIN","input":{"filename_string":"S3B_SL_1_RBT____20230427T234045_20230427T235320_20230630T215052_0754_079_001______LN3_D_NT_002.SEN3"},"follows_from_task":"b4c8a3b3-44a1-4fe2-b68a-cd6636608c85"}}

Bug Generic Definition of Ready (DoR)

Bug Generic Definition of Done (DoD)

w-jka commented 1 year ago

@suberti-ads We added a new parameter for this issue:

app.preparation-worker.pdu.config.<product_type>.minPDULengthThreshold=0.0
app.housekeep.pdu.config.<product_type>.minPDULengthThreshold=0.0

This parameter tries to merge (or drop if not possible) too short time intervals in order to prevent this issue. It will be included in the next delivery 1.14.0-rc2.

vgava-ads commented 1 year ago

System_CCB_2023_w31: Delivered in the Processing Sentinel-3 v.14.0 (Refer to https://github.com/COPRS/processing-sentinel-3/releases/tag/1.14.0-rc2) and in the Processing Sentinel-1 v1.14.0 (Refer to https://github.com/COPRS/processing-sentinel-1/releases/tag/1.14.0-rc2) and in the Processing Common v1.14.0 (Refer to https://github.com/COPRS/production-common/releases/tag/1.14.0-rc2)

To be validated by IVV/OPS team.

LAQU156 commented 1 year ago

System_CCB_2023_w31 : Moved into "Accepted Werum", and to validate, action done.

suberti-ads commented 1 year ago

I don't find configuration in contents for pug-ntc in last delivery: v 1.14.0-rc-2: https://github.com/COPRS/processing-sentinel-3/blob/1.14.0-rc2/s3-pug-ntc/content/stream-parameters.properties develop branch: https://github.com/COPRS/processing-sentinel-3/blob/develop/s3-pug-ntc/content/stream-parameters.properties So i add workaround tag

We will add and test this workaround with 1.14.

LAQU156 commented 1 year ago

System_CCB_2023_w31 : To be tested with 1.14.0 version

Woljtek commented 1 year ago

There is not any occurrence of this error after deployment of 1.14.0:

System_CCB_2023_w31 : Fixed, closed

suberti-ads commented 3 months ago

I have bad news on this issue: