COPRS / rs-issues

This repository contains all the issues of the COPRS project (Scrum tickets, ivv bugs, epics ...)
2 stars 2 forks source link

[BUG] PUG failed #837

Closed Woljtek closed 1 year ago

Woljtek commented 1 year ago

Environment:

Traçability: S3 L1 PUG Deployment

Current Behavior: All PUG jobs ends by the following error:

sa.s1pdgs.cpoc.common.errors.processing.IpfExecutionWorkerProcessExecutionException: Task /usr/local/components/PUG-3.45/bin/PUGCoreProcessor failed

Expected Behavior: IPF S3 L1 PUG successfully consolidates S3 L0 products

Steps To Reproduce: Produce L0 ISP (from S3-L0p) Deploy PUG NRT addons.

Test execution artefacts (i.e. logs, screenshots…) https://app.zenhub.com/files/398313496/2cb548e3-e6e8-43ab-92d2-a69afdf414ce/download

Whenever possible, first analysis of the root cause On container, there is the following node:

2023-02-22T14:56:26.279880 s3-pug-nrt-part1-execution-worker-v8-6bbc7479c6-gqjvl PUG_SR_0_SRA 03.45 [0000000075]: [E] PUGCoreProcessor: Wed Feb 22 14:56:26 2023
 PID: 75 SIGNAL 11 THREAD: 139822215415552
 core in: /tmp/core.75 - stack follow
    /usr/local/components/PUG-3.45/bin/../lib/libSignal.so.5.4 ( acs::Signal::catchBadSignal(int) )
    /lib64/libpthread.so.0 (  )
    /usr/lib64/libstdc++.so.6 ( std::string::assign(std::string const&) )
    /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::PDUGeneratorThread::setInfoForStatistics() )
    /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::StripeGeneratorThread::createPDU() )
    /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::StripeGeneratorThread::run() )
    /usr/local/components/PUG-3.45/bin/../lib/../lib/libThread.so.5.16 ( acs::Thread::svc(void*) )
    /lib64/libpthread.so.0 (  )
    /lib64/libc.so.6 ( clone )

No clue from resources consumption: image.png image.png

Bug Generic Definition of Ready (DoR)

Bug Generic Definition of Done (DoD)

w-jka commented 1 year ago

The provided logs seem to be from an old configuration version, as the error is still the one from before the updated joborder.xslt. @Woljtek could you provide a current log?

Woljtek commented 1 year ago

I deleted the topic TOPIC=s3-pug-part1.preparation-worker before restarting the chain. So I don't think there is any old job. The logs are already in the Test execution artefacts section

w-jka commented 1 year ago

@Woljtek The logs of the Test execution artefacts still state the following PUG error:

2023-02-15T16:33:55.712163 s3-pug-nrt-part1-execution-worker-v3-554d94c545-tbh62 [0000000096]: [E] [PUGCoreProcessor.C: main:(173)] Unable to load JobOrder from file  "/data/localWD/52655/JobOrder.52655.xml --- acs::S3PUGJobOrder::exS3PUGJobOrderException in S3PUGJobOrder.C(659) from virtual void acs::S3PUGJobOrder::read(acs::XMLIstream&) thread "" [140183974770656]
    Error while reading job order
    caused by:
    acs::rsResourceSet::NotFoundException in rsResourceSet.C(860) from const acs::rsResourceSet::rsValue* acs::rsResourceSet::getValue(const std::string&) const thread "" [140183974770656]
    Resource not found: List_of_Config_Files.Config_File in namespace "Ipf_Conf"
w-jka commented 1 year ago

The dmesg of the pod: grafik.png

This indicates, that the IPF itself runs into an issue, that seems to be unrelated to our software.

Woljtek commented 1 year ago

This logs is outdated:

2023-02-15T16:33:55.712163 s3-pug-nrt-part1-execution-worker-v3-554d94c545-tbh62 [0000000096]: [E] [PUGCoreProcessor.C: main:(173)] Unable to load JobOrder from file  "/data/localWD/52655/JobOrder.52655.xml --- acs::S3PUGJobOrder::exS3PUGJobOrderException in S3PUGJobOrder.C(659) from virtual void acs::S3PUGJobOrder::read(acs::XMLIstream&) thread "" [140183974770656]
    Error while reading job order
    caused by:
    acs::rsResourceSet::NotFoundException in rsResourceSet.C(860) from const acs::rsResourceSet::rsValue* acs::rsResourceSet::getValue(const std::string&) const thread "" [140183974770656]
    Resource not found: List_of_Config_Files.Config_File in namespace "Ipf_Conf"

It is relative to the bug #828 which is now workaround

Woljtek commented 1 year ago

@w-jka Do you think this behavior is IPF issue or a deployment issue ?

FYI, I am going to increase the EW memory limits to 50Gi according to the prerequistes

Woljtek commented 1 year ago

I reproduced the same behavior with the limit at 50Gi.

w-fsi commented 1 year ago

This is actually not to easy to answer. We are observing a sigsev that is invoked from somewhere and killing the process. This is usually a memory violation and thus very unlikely caused by our software. From a deployment perspective, we not finding any issue at the moment that could explain it and the expectation would be that increasing the memory limits is not changing anything as it is not a out of memory issue.

The issue occurs on the libc of the system and is very likely an issue with the processor or the operating system. We might give it a try to use an old version and see if it occurs there also. However without having at least a document giving an idea about the operating system that is required most of it is pure guessing.

Woljtek commented 1 year ago

We open a PDGSAMON -> https://cams.esa.int/browse/PDGSANOM-12241 I propose to put this issue on hold waiting fro SDP feedback

LAQU156 commented 1 year ago

IVV_CCB_2023_w09 : Moved into "On hold" waiting for ESA feedback

SYTHIER-ADS commented 1 year ago

From S-3 L1 IPF Maintainers to Reference System Dears, the WD has been analyzed by the PUG maintenance team. It seems the error comes from a missing value for the HardwareName in the JobOrder. Regards, S3IPF L1 Maintenance team

w-jka commented 1 year ago

In order to include this dynamic process parameter, one has to add the following lines to the configuration:

app.preparation-worker.pdu.dyn-proc-params.hardwareName=O
app.housekeep.pdu.dyn-proc-params.hardwareName=O

The allowed values for this parameter are:

O -> (OPE)
F -> (REF)
D -> (DEV)
R -> (REP)

We will add the value O to the default configuration for now.

LAQU156 commented 1 year ago

IVV_CCB_2023_w13 : Moved into "Accepted OPS", Tests now with @Woljtek and @w-jka

omiazek-ads commented 1 year ago

The configuration has been added :

app.preparation-worker.pdu.dyn-proc-params.hardwareName=O
app.housekeep.pdu.dyn-proc-params.hardwareName=O

However, the error is still present (tested on OL_0 products) :

2023-04-14T14:40:50.174363 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001175]: [E] PUGCoreProcessor: Fri Apr 14 14:40:50 2023
 PID: 1175 SIGNAL 11 THREAD: 140402505103104
 core in: /tmp/core.1175 - stack follow
    /usr/local/components/PUG-3.45/bin/../lib/libSignal.so.5.4 ( acs::Signal::catchBadSignal(int) )
    /lib64/libpthread.so.0 (  )
    /usr/lib64/libstdc++.so.6 ( std::string::assign(std::string const&) )
    /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::PDUGeneratorThread::setInfoForStatistics() )
    /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::StripeGeneratorThread::createPDU() )
    /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::StripeGeneratorThread::run() )
    /usr/local/components/PUG-3.45/bin/../lib/../lib/libThread.so.5.16 ( acs::Thread::svc(void*) )
    /lib64/libpthread.so.0 (  )
    /lib64/libc.so.6 ( clone )
Woljtek commented 1 year ago

@w-jka How can I check if the hardwareName is taken into account on EW ?

w-jka commented 1 year ago

@Woljtek There are two ways. The hardwareName is included in the joborder. If you download the working directory from the failed workdir bucket, you may check the JobOrder.xml file there if it contains the dynamic process parameter. The JobOrder is also printed in the logs, however it is not as nicely formatted. I would advice using the approach with the failed workdir. If you could provide the file in this issue, we can have a look at it as well.

Woljtek commented 1 year ago

@w-jka Thank for the quick answer.

On a failed JO, I observed that the hadwareName is not filled: image.png Source JobOrder.3211.xml: s3://ops-rs-failed-workdir/s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c_S3B_OL_0_EFR__20230409T182835_20230409T183033_20230409T210042_0118_078_127____LN3_D_NR_002.SEN3_b582e550-fccb-4530-a9ab-15300d897ea6_0/JobOrder.3211.xml

This JO triggers the bug of this issue. Extract from logs for job jobOrder /data/localWD/3211/JobOrder.3211.xml:

2023-04-14T14:31:55.358330 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Loaded configured parameter "ProductTypeConf.OL_0_EFR___.DeltaTime" = <-0.044> 
2023-04-14T14:31:55.358423 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Loaded configured parameter "ProductTypeConf.OL_0_EFR___.CheckJOInterval" = <3>  [unit: lines] 
2023-04-14T14:31:55.358471 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Converted to seconds: "ProductTypeConf.OL_0_EFR___.CheckJOInterval" = <0.132>  [unit: s] 
2023-04-14T14:31:55.361569 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Processing orbit file [/data/localWD/3211/S3B_AX___FRO_AX_20230409T000000_20230419T000000_20230412T065540___________________EUM_O_AL_001.SEN3]
2023-04-14T14:31:55.369273 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Processing orbit file [/data/localWD/3211/S3B_AX___OSF_AX_20180425T191855_99991231T235959_20221110T110324___________________EUM_O_AL_001.SEN3]
2023-04-14T14:31:55.371772 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Going to uncompress the file [S3B_OPER_MPL_ORBSCT_20180425T191855_99999999T999999_0010.TGZ] if needed
2023-04-14T14:31:55.379445 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Orbit scenario file used for propagator init is [/data/localWD/3211/S3B_AX___OSF_AX_20180425T191855_99991231T235959_20221110T110324___________________EUM_O_AL_001.SEN3/S3B_OPER_MPL_ORBSCT_20180425T191855_99999999T999999_0010.EOF]
2023-04-14T14:31:55.436730 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [I] PUGCoreProcessor: Adding input file /data/localWD/3211/S3B_OL_0_EFR____20230409T182835_20230409T183033_20230409T210042_0118_078_127______LN3_D_NR_002.SEN3 in time interval [2023-04-09T18:28:34.816075, 2023-04-09T18:30:33.049222]
2023-04-14T14:31:55.437123 s3-pug-preint-part1-execution-worker-v3-56cfb6d578-mlt4c PUG_OL_0_EFR 03.45 [0000001103]: [E] PUGCoreProcessor: Fri Apr 14 14:31:55 2023
 PID: 1103 SIGNAL 11 THREAD: 139697621907200
 core in: /tmp/core.1103 - stack follow
        /usr/local/components/PUG-3.45/bin/../lib/libSignal.so.5.4 ( acs::Signal::catchBadSignal(int) )
        /lib64/libpthread.so.0 (  )
        /usr/lib64/libstdc++.so.6 ( std::string::assign(std::string const&) )
        /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::PDUGeneratorThread::setInfoForStatistics() )
        /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::StripeGeneratorThread::createPDU() )
        /usr/local/components/PUG-3.45/bin/../lib/libS3PDUGenerator.so.2.1 ( acs::StripeGeneratorThread::run() )
        /usr/local/components/PUG-3.45/bin/../lib/../lib/libThread.so.5.16 ( acs::Thread::svc(void*) )
        /lib64/libpthread.so.0 (  )
        /lib64/libc.so.6 ( clone )

{"header":{"type":"LOG","timestamp":"2023-04-14T14:31:55.646145Z","level":"INFO","line":129,"file":"TaskCallable.java","thread":"pool-382-thread-1"},"message":{"content":"Ending task /usr/local/components/PUG-3.45/bin/PUGCoreProcessor with exit code 139"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
{"header":{"type":"REPORT","timestamp":"2023-04-14T14:31:55.646000Z","level":"INFO","mission":"S3","workflow":"NOMINAL","rs_chain_name":"S3-PUG-NRT-PREINT","rs_chain_version":"1.12.0-rc1"},"message":{"content":"End Task /usr/local/components/PUG-3.45/bin/PUGCoreProcessor with exit code 139"},"task":{"uid":"b7e8805d-df77-40cd-a459-9f9a308966e8","name":"ProcessingTask","event":"END","status":"OK","output":{},"input":{},"quality":{},"error_code":0,"duration_in_seconds":0.34,"missing_output":[]}}
{"header":{"type":"REPORT","timestamp":"2023-04-14T14:31:55.647000Z","level":"ERROR","mission":"S3","workflow":"NOMINAL","rs_chain_name":"S3-PUG-NRT-PREINT","rs_chain_version":"1.12.0-rc1"},"message":{"content":"[code 290] [exitCode 139] [msg Task /usr/local/components/PUG-3.45/bin/PUGCoreProcessor failed]"},"task":{"uid":"8135cc03-1bb3-4dea-819c-e3c7faae515c","name":"Processing","event":"END","status":"NOK","output":{},"input":{},"quality":{},"error_code":1,"duration_in_seconds":0.342,"missing_output":[]}}

Could you have a look why the hadwareName value is empty whereas the stream.parameter looks well ?

w-jka commented 1 year ago

@Woljtek Could you provide the preparation-worker log for this job? It is not available on the cluster anymore, so I could not take a look at it.

Woljtek commented 1 year ago

The PW is still running (but we changed the default name): s3-pug-preint-part1-preparation-worker-v3-647cfd89c4-6h6qk Logs file: https://app.zenhub.com/files/398313496/4872e4aa-81e0-4f19-81b5-9d83b85b2114/download

w-jka commented 1 year ago

@Woljtek Yes I saw that as well, however the earliest logs available per kubectl are from today morning. The job in question however ran last week.

Woljtek commented 1 year ago

@w-jka I extracted from Loki all Friday logs with this query {pod="s3-pug-preint-part1-preparation-worker-v3-647cfd89c4-6h6qk"} |= 'AppDataJob 3211': 72 hits https://app.zenhub.com/files/398313496/2d5ac115-6ee5-4bc9-a4ab-9bb5aff85fcc/download

All logs between the 2023-04-14T13:55:26.636725 and 2023-04-14T13:57:26.138833 (first/last hits) https://app.zenhub.com/files/398313496/75d6b145-49bd-4dfe-9ea5-18b84b96d3b2/download

If it not enough, we will plan to restart the pug test.

w-jka commented 1 year ago

@Woljtek Please make another test with the following configuration (replace existing parts of the config as needed):

app.preparation-worker.pdu.config.OL_0_EFR___.type=STRIPE
app.preparation-worker.pdu.config.OL_0_EFR___.reference=DUMP
app.preparation-worker.pdu.config.OL_0_EFR___.length-in-s=6060
app.preparation-worker.pdu.config.OL_0_EFR___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.OL_0_EFR___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.OL_1_EFR___.type=FRAME
app.preparation-worker.pdu.config.OL_1_EFR___.length-in-s=180
app.preparation-worker.pdu.config.OL_1_ERR___.type=STRIPE
app.preparation-worker.pdu.config.OL_1_ERR___.reference=DUMP
app.preparation-worker.pdu.config.OL_1_ERR___.length-in-s=6060
app.preparation-worker.pdu.config.OL_1_ERR___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.OL_1_ERR___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.OL_2_LFR___.type=FRAME
app.preparation-worker.pdu.config.OL_2_LFR___.length-in-s=180
app.preparation-worker.pdu.config.OL_2_LRR___.type=STRIPE
app.preparation-worker.pdu.config.OL_2_LRR___.reference=DUMP
app.preparation-worker.pdu.config.OL_2_LRR___.length-in-s=6060
app.preparation-worker.pdu.config.OL_2_LFR___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.OL_2_LFR___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.SL_0_SLT___.type=STRIPE
app.preparation-worker.pdu.config.SL_0_SLT___.reference=DUMP
app.preparation-worker.pdu.config.SL_0_SLT___.length-in-s=6187
app.preparation-worker.pdu.config.SL_0_SLT___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SL_0_SLT___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.SL_1_RBT___.type=FRAME
app.preparation-worker.pdu.config.SL_1_RBT___.length-in-s=180
app.preparation-worker.pdu.config.SL_1_RBT___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SL_1_RBT___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.SL_2_LST___.type=FRAME
app.preparation-worker.pdu.config.SL_2_LST___.length-in-s=180
app.preparation-worker.pdu.config.SL_2_LST___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SL_2_LST___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.SR_0_SRA___.type=STRIPE
app.preparation-worker.pdu.config.SR_0_SRA___.reference=ORBIT
app.preparation-worker.pdu.config.SR_0_SRA___.length-in-s=3029.6
app.preparation-worker.pdu.config.SR_0_SRA___.offset-in-s=1512.59
app.preparation-worker.pdu.config.SR_0_SRA___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SR_0_SRA___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.SR_1_SRA___.type=STRIPE
app.preparation-worker.pdu.config.SR_1_SRA___.reference=DUMP
app.preparation-worker.pdu.config.SR_1_SRA___.length-in-s=6187
app.preparation-worker.pdu.config.SR_1_SRA___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SR_1_SRA___.dyn-proc-params.hardwareName=O
app.preparation-worker.pdu.config.SR_2_LAN___.type=STRIPE
app.preparation-worker.pdu.config.SR_2_LAN___.reference=DUMP
app.preparation-worker.pdu.config.SR_2_LAN___.length-in-s=6187
app.preparation-worker.pdu.config.SR_2_LAN___.dyn-proc-params.facilityName=LN3
app.preparation-worker.pdu.config.SR_2_LAN___.dyn-proc-params.hardwareName=O

...

app.housekeep.pdu.config.OL_0_EFR___.type=STRIPE
app.housekeep.pdu.config.OL_0_EFR___.reference=DUMP
app.housekeep.pdu.config.OL_0_EFR___.length-in-s=6060
app.housekeep.pdu.config.OL_0_EFR___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.OL_0_EFR___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.OL_1_EFR___.type=FRAME
app.housekeep.pdu.config.OL_1_EFR___.length-in-s=180
app.housekeep.pdu.config.OL_1_ERR___.type=STRIPE
app.housekeep.pdu.config.OL_1_ERR___.reference=DUMP
app.housekeep.pdu.config.OL_1_ERR___.length-in-s=6060
app.housekeep.pdu.config.OL_1_ERR___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.OL_1_ERR___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.OL_2_LFR___.type=FRAME
app.housekeep.pdu.config.OL_2_LFR___.length-in-s=180
app.housekeep.pdu.config.OL_2_LRR___.type=STRIPE
app.housekeep.pdu.config.OL_2_LRR___.reference=DUMP
app.housekeep.pdu.config.OL_2_LRR___.length-in-s=6060
app.housekeep.pdu.config.OL_2_LFR___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.OL_2_LFR___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.SL_0_SLT___.type=STRIPE
app.housekeep.pdu.config.SL_0_SLT___.reference=DUMP
app.housekeep.pdu.config.SL_0_SLT___.length-in-s=6187
app.housekeep.pdu.config.SL_0_SLT___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SL_0_SLT___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.SL_1_RBT___.type=FRAME
app.housekeep.pdu.config.SL_1_RBT___.length-in-s=180
app.housekeep.pdu.config.SL_1_RBT___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SL_1_RBT___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.SL_2_LST___.type=FRAME
app.housekeep.pdu.config.SL_2_LST___.length-in-s=180
app.housekeep.pdu.config.SL_2_LST___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SL_2_LST___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.SR_0_SRA___.type=STRIPE
app.housekeep.pdu.config.SR_0_SRA___.reference=ORBIT
app.housekeep.pdu.config.SR_0_SRA___.length-in-s=3029.6
app.housekeep.pdu.config.SR_0_SRA___.offset-in-s=1512.59
app.housekeep.pdu.config.SR_0_SRA___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SR_0_SRA___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.SR_1_SRA___.type=STRIPE
app.housekeep.pdu.config.SR_1_SRA___.reference=DUMP
app.housekeep.pdu.config.SR_1_SRA___.length-in-s=6187
app.housekeep.pdu.config.SR_1_SRA___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SR_1_SRA___.dyn-proc-params.hardwareName=O
app.housekeep.pdu.config.SR_2_LAN___.type=STRIPE
app.housekeep.pdu.config.SR_2_LAN___.reference=DUMP
app.housekeep.pdu.config.SR_2_LAN___.length-in-s=6187
app.housekeep.pdu.config.SR_2_LAN___.dyn-proc-params.facilityName=LN3
app.housekeep.pdu.config.SR_2_LAN___.dyn-proc-params.hardwareName=O
Woljtek commented 1 year ago

The WA has been successfully applied: image.png source : s3://ops-rs-failed-workdir/s3-pug-preint-part1-execution-worker-v5-5fc6cb6788-mr9zg_S3B_SR_0_SRA__20230409T231430_20230409T232430_20230410T001628_0599_078_130____LN3_D_NR_002.SEN3_0664b6d5-c6c8-4086-a6c3-4377502e49ab_0/JobOrder.4737.xml

On the PUG EW logs, the error disappeared:

kp logs s3-pug-preint-part1-execution-worker-v5-5fc6cb6788-mr9zg -c s3-pug-preint-part1-execution-worker-v5  | grep stack | wc -l
0

A add the label workaround and decrease the priority.

pcuq-ads commented 1 year ago

SYS_CCB_w17 : Release 1.13 solves the issue (refer to https://github.com/COPRS/processing-sentinel-3/releases/tag/1.13.1-rc1)