Closed pcuq-ads closed 1 year ago
2023-07-24T05:40:25Z {"header":{"type":"LOG","timestamp":"2023-07-24T05:40:25.875714Z","level":"INFO","line":129,"file":"TaskCallable.java","thread":"pool-298-thread-1"},"message":{"content":"Ending task /usr/local/components/S3IPF_OL1_06.13/bin/OL1.bin with exit code 255"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-07-24T05:40:25Z 2023-07-24T05:40:25.856503 s3-ol1-ntc-part1-execution-worker-v6-759c85b556-ddgxf IPF-OL-1-EO 06.13 [0000003450]: [E] Got error after call at IPF-OL-1/src/ol1eo_processor.c:main:106
2023-07-24T05:40:25Z 2023-07-24T05:40:25.856480 s3-ol1-ntc-part1-execution-worker-v6-759c85b556-ddgxf IPF-OL-1-EO 06.13 [0000003450]: [E] Got error after call at IPF-OL-1/src/ol1eo_processing.c:ol1eo_processing:442
2023-07-24T05:40:25Z 2023-07-24T05:40:25.856470 s3-ol1-ntc-part1-execution-worker-v6-759c85b556-ddgxf IPF-OL-1-EO 06.13 [0000003450]: [E] Processing step 'ol1co_adf_cal_load' ended in failure
2023-07-24T05:40:25Z 2023-07-24T05:40:25.856459 s3-ol1-ntc-part1-execution-worker-v6-759c85b556-ddgxf IPF-OL-1-EO 06.13 [0000003450]: [E] Got error after call at IPF-OL-1/src/ol1co_adf_cal.c:ol1co_adf_cal_load:192
2023-07-24T05:40:25Z 2023-07-24T05:40:25.856445 s3-ol1-ntc-part1-execution-worker-v6-759c85b556-ddgxf IPF-OL-1-EO 06.13 [0000003450]: [E] Got error after call at common/libs3ipf_packing/src/netcdf_util.c:ncutil_inq_varinfo:73
2023-07-24T05:40:25Z 2023-07-24T05:40:25.856389 s3-ol1-ntc-part1-execution-worker-v6-759c85b556-ddgxf IPF-OL-1-EO 06.13 [0000003450]: [E] NetCDF: Variable not found
From the logs provided there is no information that the root cause is in our application. The root cause seems to be in the libraries of the IPF.
According to Sylvain, the issue is linked to AUX data baseline.
The error inside the LOG [E] Got error after call at IPF-OL-1/src/ol1co_adf_cal.c:ol1co_adf_cal_load:192
shows that an AUX data used for Centos 7 version has been used. It is not compliant with Centos 6 chain.
Here is the AUX data inside the job order that is the root cause of the issue. S3A_OL_1_EOAX_20160425T103700_20991231T23595920230613T120000____MPC_O_AL_015.SEN3
We will handle this issue with OPS.
Following AUX_DATA have been ingested for preint test v1.14 (Centos 7)
-rw-r--r-- 1 root root 4842217 Jul 17 15:38 S3A_OL_1_CAL_AX_20230620T000000_20991231T235959_20230616T120000___________________MPC_O_AL_028.SEN3.tgz
-rw-r--r-- 1 root root 9142 Jul 17 15:38 S3A_OL_1_EO__AX_20160425T103700_20991231T235959_20230613T120000___________________MPC_O_AL_015.SEN3.tgz
-rw-r--r-- 1 root root 4938869 Jul 17 15:39 S3B_OL_1_CAL_AX_20230620T000000_20991231T235959_20230616T120000___________________MPC_O_AL_018.SEN3.tgz
-rw-r--r-- 1 root root 9124 Jul 17 15:39 S3B_OL_1_EO__AX_20180618T000000_20991231T235959_20230613T120000___________________MPC_O_AL_009.SEN3.tgz
Product have been deleted from catalog and production restarted 5 execution done ==> We can close this issue as incident due to wrong auxiliary data.
SYS_CCB_W30: this is an incident. The issue is closed.
Environment:
Traceability:
Current Behavior: 100% of processing S3-OL1-NTC processing failed on the 3% production from last week (2023-07-24).
Expected Behavior: S3-OL1-NTC shall generate production without errors.
Steps To Reproduce: Check production from S3-OL1-NTC.
Test execution artefacts (i.e. logs, screenshots…) Logs are here https://app.zenhub.com/files/398313496/0ec45e07-ad0d-4ee3-a0ec-c11291c86429/download
No error found on ressource. No error found on bucket read/write.
Whenever possible, first analysis of the root cause Hypothesis : Regression with Metadata Extraction or Metadata search controller version 1.14-rc1 ???
Bug Generic Definition of Ready (DoR)
Bug Generic Definition of Done (DoD)