COPRS / rs-issues

This repository contains all the issues of the COPRS project (Scrum tickets, ivv bugs, epics ...)
2 stars 2 forks source link

[BUG] [S1] IW_OCN__2A product is not generated. #966

Closed pcuq-ads closed 1 year ago

pcuq-ads commented 1 year ago

Environment:

Traceability:

Current Behavior: During the test 24H, we observe that product IW_OCN__2A is not generated whereas IW_OCN__2S is generated.

Expected Behavior: Any slice level-2 shall be generated with its annotation.

Steps To Reproduce: Check production.

Test execution artefacts (i.e. logs, screenshots…) Check PRIP index, with

attr_productType_string:IW_OCN*

image.png

We see the issue.

Check PRIP, with

attr_productType_string:EW_OCN*

image.png

It is OK for EW_OCN.

Whenever possible, first analysis of the root cause Hypothesis : 1) Annotation not sent to catalog ?? 2) Annotation not generated ??

Bug Generic Definition of Ready (DoR)

Bug Generic Definition of Done (DoD)

w-fsi commented 1 year ago

Can you please provide the logs of the execution worker of a related production, so we can have a look if there are any indications. As the execution workers are down, it is not possible to analyze this at the moment. At least the metadata extraction are not having any indication that these kind of products was there.

Having the annotations, but not the slices is something I didn't observe in the past. My guts feeling is that there might be some hint in the IPF logs.

LAQU156 commented 1 year ago

IVV_CCB_2023_w19 : Moved into "Accepted Werum", Priority blocking, to be fixed phase 1

LAQU156 commented 1 year ago

Werum_CCB_2023_w19 : Moved into "Product Backlog" for further analysis

pcuq-ads commented 1 year ago

I check JobProcessing end trace end there is no annotation mentioned on the trace for IW_OCN.

{
    "header": {
      "type": "REPORT",
      "timestamp": "2023-04-29T08:17:59.760000Z",
      "level": "INFO",
      "mission": "S1",
      "workflow": "NOMINAL",
      "rs_chain_name": "S1-L2",
      "rs_chain_version": "1.12.0-rc1"
    },
    "message": {
      "content": "End job processing"
    },
    "task": {
      "uid": "400ea3fd-d715-4f7d-abb3-831b12077fff",
      "name": "JobProcessing",
      "event": "END",
      "status": "OK",
      "output": {
        "debug": false,
        "t0_pdgs_date": "2023-04-27T17:27:31.000000Z",
        "filename_string": "S1A_IW_OCN__2SDV_20230427T172255_20230427T172320_048285_05CE7B_5118.SAFE"
      },
      "input": {
        "job_order_id_string": "JobOrder.82730.xml",
        "ipf_release_string": "3.61",

For S1A_EW_OCN I check that output mention both slice and annotation on the traces.

pcuq-ads commented 1 year ago

Hello @w-fsi I am not sure if this will solve this issue, but I think we need to change something in the configuration. I see that

app.preparation-worker.process.outputregexps.WV_OCN__2S=^S1[A-B]_S[1-6]_OCN__2S.*$                            
app.preparation-worker.process.outputregexps.WV_OCN__2A=^S1[A-B]_S[1-6]_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2A=^S1[A-B]_S[1-6]_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2S=^S1[A-B]_S[1-6]_OCN__2S.*$

I guess, we should change it with :

app.preparation-worker.process.outputregexps.WV_OCN__2S=^S1[A-B]_WV_OCN__2S.*$                            
app.preparation-worker.process.outputregexps.WV_OCN__2A=^S1[A-B]_WV_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2A=^S1[A-B]_S[1-6]_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2S=^S1[A-B]_S[1-6]_OCN__2S.*$

Or maybe I miss something ? Can you confirm please ?

Regards

w-jka commented 1 year ago

@pcuq-ads Thank you for your observations. We checked the configuration and the tasktable and while this line of configuration is indeed misleading, it has no impact on the production on neither the IW products nor the WV products as the outputs are not declared as Regexp in the tasktable. The configuration was used from the previous project S1PRO as well, where no issues regarding this are to our knowledge.

One thing to note is, that the 0S outputs are declared as mandatory in the tasktable, while this is not the case for the 0A outputs. Could you check, tat the production service is producing 0A for these slices?

If yes, we need the execution-worker logs for the production of the 0S files, as the 0A would have been produced by the IPF with the same joborder.

One additional remark, unrelated to this issue: The tasktable SM_RAW__0_OCN__2.xml changed since the version used for S1PRO, especially in the configuration of the outputs. In the newer tasktable the outputs are not declared as Regexp which will lead to problems with the L2 production of the S[1-6] products with high certainty. As the tasktables are having some other indications of copy-paste errors, it might be, that this is also the result of a manual error on the side of the IPF provider.

pcuq-ads commented 1 year ago

@w-jka We are facing a systematic issue. On production service, I cannot find any slice L2 without its annotation. Here is the production during the 24 hours day. https://app.zenhub.com/files/398313496/ad8ae510-2db3-4e28-9d0e-8ee89997e7ea/download

On RS, I could not find any IW L2 annotation when I get IW L2 slice. For example, for slices :

w-jka commented 1 year ago

@pcuq-ads Thank you for checking the results of the production service. Can you please provide the execution worker logs for one of those jobs (producing a OCN__2S product without its corresponding annotation product), so we can check what causes this behaviour?

pcuq-ads commented 1 year ago

Here are some logs. https://app.zenhub.com/files/398313496/31fd85c0-cea6-4f4f-808d-cf0d704e694c/download

I cannot catch a big time frame. I hope I have catch the right area.

w-jka commented 1 year ago

@pcuq-ads

From the extract you provided I could identify this passage that gives some insight into what might be the root cause of this issue:

2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675492Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.675030 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]  AttributeError: 'NoneType' object has no attribute 'GetLayer'"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675467Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.675017 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"src/cls/osgeo/raster/get_mask.py\", line 121, in cls.osgeo.raster.get_mask.EarthMask.getMask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675452Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.675004 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"src/cls/data/sar/l1/swath.py\", line 575, in cls.data.sar.l1.swath.Swath._get_earth_mask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675435Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674991 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"src/cls/data/sar/l1/swath.py\", line 545, in cls.data.sar.l1.swath.Swath._get_earth_mask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675419Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674978 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"src/cls/data/sar/l1/sar_dataset.py\", line 859, in cls.data.sar.l1.sar_dataset.SarDataset.get_earth_mask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675398Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674964 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]      mask = ds.get_earth_mask(0, 0, ds.xsize // nx * nx, ds.ysize // ny * ny,"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675382Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674951 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"/usr/local/components/S1IPF/lib/python3.9/site-packages/cls/sentinel1/lop_rvl_application.py\", line 763, in make_iw_rvl"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675362Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674938 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]      self.make_iw_rvl(subswath)"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675340Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674924 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"/usr/local/components/S1IPF/lib/python3.9/site-packages/cls/sentinel1/lop_rvl_application.py\", line 210, in __init__"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675319Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674908 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]      LopRvlApplication(st, subswath)"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675292Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674890 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]    File \"/usr/local/components/S1IPF/lib/python3.9/site-packages/cls/sentinel1/lop_rvl_application.py\", line 63, in rvl_application_worker"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675272Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674846 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E]  Traceback (most recent call last):"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z    {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675194Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.673998 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] (LOP_ProcMain) SENTINEL_LOP_STATUS=0"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}

The IPF itself runs into an error and gives an error code 127 (warning, INCOMPLETE based on the documentation). The error is related to a mask that is provided. As this error occurs in the execution worker and not the preparation-worker, the related mask is not provided by configuration, but I would assume, it is part of the DEM. The related error message here is AttributeError: 'NoneType' object has no attribute 'GetLayer'

So from the currently available information, the root cause is in the DEM files provided to the IPF.

w-fsi commented 1 year ago

I am not sure if it is related to the DEM files as the error sounds more like some kind of mask is loaded. I assume that this is the counterpart of the Prep worker mask (that tries to filter out before it reaches the EW). The IPF seems to have some kind of check on its own. There is the RPM "S1PD-IPF-OSM_3.6.0-3.6.0-00.01.x86_64.rpm" that sounds like it contains some "polygones" and landscape information. Maybe it was not installed?

However this issue is clearly related to the IPF and not caused by EW. The IPF tries to load something that is not found.

Woljtek commented 1 year ago

@w-fsi Please find below the starting pods log in order to investigate if the masks are well-loaded for S1-L1 and S1-L2 (updated by @pcuq-ads )

https://app.zenhub.com/files/398313496/2ee55f05-563d-4bba-bfbb-b83763d710bb/download

LAQU156 commented 1 year ago

Werum_CCB_2023_w22 : Action on @OPS team to check if the RPM (S1PD-IPF-OSM_3.6.0-3.6.0-00.01.x86_64.rpm) is well installed in the docker image.

pcuq-ads commented 1 year ago

@SYTHIER-ADS has found discrepancies on volume mount between S1-L1 and S1-L2 configuration This pull request may fix part or all of this issue.

https://github.com/COPRS/rs-config/pull/232

LAQU156 commented 1 year ago

Werum_CCB_2023_w23 : The change provided in the pull request in rs config repository has to be retrieved to s1 repository

w-fsi commented 1 year ago

Default configuration was updated accordingly. This issue will be fixed in 1.13.2 and thus be contained in V2.

pcuq-ads commented 1 year ago

The issue remains with version 1.13.2 of S1-L2 RS core add-on. Request attr_productType_string:IW_OCN__2* on the ES :

image.png

pcuq-ads commented 1 year ago

IVV_CCB_2023_w29 : fix failed with release 1.13.2 . To be moved to WERUM board.with version

Woljtek commented 1 year ago

Werum_CCB_2023_w29: Accepted, to be analyzed.

pcuq-ads commented 1 year ago

@w-fsi

I check that the configuration deployed in OPS is the one from RS_CONFIG/OPS (check on SCDF configuration).

Something is strange to me. I see two syntaxes for volume mount propery:

sometimes it is volumeMounts

deployer.housekeep.kubernetes.volumeMounts=[{name: tasktables, mountPath: '/app/tasktables'},{name: maskfile, mountPath: '/shared/sentinel1/masks'}]
deployer.preparation-worker.kubernetes.volumeMounts=[{name: tasktables, mountPath: '/app/tasktables'},{name: maskfile, mountPath: '/shared/sentinel1/masks'}] 

sometimes it is volume-mounts

deployer.execution-worker-high.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
deployer.execution-worker-medium.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
deployer.execution-worker-low.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]

Do you thing this can be the ROOT CAUSE of this issue ?

Regards

w-fsi commented 1 year ago

@pcuq-ads: No, I don't think that this is an issue. Both naming pattern should be accepted by SCDF and I assume that they are applied as well. S2 configurations are using a different one than our configurations, so I assume it is mixed up by this somehow.

This issue is still really strange to me. I checked the TT and the annotations are non-mandatory in there. However the concept of enforcing non-mandatory outputs does not exist for the S1 and thus either the issue exists since the beginning or the last IPF upgrades changed the behaviour somehow. The python library logs still makes me think that this is somehow related to a configuration that is loaded.

Maybe we should schedule a debug session with obs, to verify a few thinks. EW seems to be scheduled to 0 at the moment, so @Woljtek do you see a change to construct this issue somehow, so we can have a look at it together?

pcuq-ads commented 1 year ago

@w-fsi, You are right, we checked that the /mount are well defined for S1-L2.

Back to square one.

w-fsi commented 1 year ago

@pcuq-ads : Did you also check the permissions within the mounts? We changed the default user for the packages and maybe they are not in line with the current settings. A permission issue might explain the observed behaviour as well.

As pointed out in the previous comment... if the mounts are well, we are in the foggy land again.

pcuq-ads commented 1 year ago

Here are full log related to an execution worker running an IW product. https://app.zenhub.com/files/398313496/d62d2723-1ae1-4c68-9739-715c264c65f1/download

suberti-ads commented 1 year ago

@w-fsi We check right on mounting point. We not seen same right on each folder: dem ==> user piccontrol/ group pic_run

[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 app]$ cd /usr/local/conf/IPF_CFG/dem
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 dem]$ ls -al
total 4
drwxrwxrwx 3 piccontrol pic_run    1 Feb  8 11:41 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
drwxr-xr-x 3 piccontrol pic_run    1 Feb  8 11:41 fine

l2ExternalAuxFiles ==> piccontrol user/ group pic_run

[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 dem]$ cd /usr/local/conf/IPF_CFG/l2ExternalAuxFiles/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 l2ExternalAuxFiles]$ ls -al
total 2149437
drwxrwxrwx 2 piccontrol pic_run       52 Feb  8 15:48 .
drwxrwxr-x 1 piccontrol pic_run     4096 Jul 20 12:31 ..
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is1-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is1-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is1-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is1-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is2-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is2-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is2-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is2-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is3-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is3-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is3-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is3-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is4-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is4-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is4-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is4-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is5-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is5-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is5-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is5-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is6-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is6-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is6-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is6-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is7-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is7-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133365 Feb  8 14:44 asa-aux-scs-is7-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 asa-aux-scs-is7-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is1-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is1-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is1-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is1-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is2-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is2-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is2-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is2-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is3-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is3-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is3-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is3-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is4-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is4-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is4-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is4-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is5-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is5-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is5-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is5-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is6-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is6-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run   133366 Feb  8 14:44 s1--aux-scs-is6-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb  8 14:44 s1--aux-scs-is6-vv.nlin

osm ==> root user / group root then user UID 1000/ 1000 in read only

[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 osm]$ ls -al
total 4
drwxrwxrwx 3 root       root       1 Apr 12 07:36 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
drwxr-xr-x 2 root       root      11 Apr 12 07:35 OSM_00360
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 osm]$ cd OSM_00360/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 OSM_00360]$ ls -al
total 1159535
drwxr-xr-x 2 root root         11 Apr 12 07:35 .
drwxrwxrwx 3 root root          1 Apr 12 07:36 ..
-rw-r--r-- 1 1000 1000        824 Apr 12 07:35 README.txt
-rw-r--r-- 1 1000 1000          5 Apr 12 07:35 lakes.cpg
-rw-r--r-- 1 1000 1000        880 Apr 12 07:35 lakes.dbf
-rw-r--r-- 1 1000 1000        145 Apr 12 07:35 lakes.prj
-rw-r--r-- 1 1000 1000    3885808 Apr 12 07:35 lakes.shp
-rw-r--r-- 1 1000 1000        220 Apr 12 07:35 lakes.shx
-rw-r--r-- 1 1000 1000          6 Apr 12 07:35 land_polygons.cpg
-rw-r--r-- 1 1000 1000   14426437 Apr 12 07:35 land_polygons.dbf
-rw-r--r-- 1 1000 1000        145 Apr 12 07:35 land_polygons.prj
-rw-r--r-- 1 1000 1000 1162971504 Apr 12 07:35 land_polygons.shp
-rw-r--r-- 1 1000 1000    6074348 Apr 12 07:35 land_polygons.shx

gshhs ==> user piccontrol / group pic_run with file with 1000/1000 UID/GID

[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 osm]$ cd /usr/local/conf/IPF_CFG/gshhs/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 gshhs]$ ls -al
total 222088
drwxrwxrwx 2 piccontrol pic_run        16 Apr 12 07:36 .
drwxrwxr-x 1 piccontrol pic_run      4096 Jul 20 12:31 ..
-rw-r--r-- 1       1000    1000  44209815 Apr 12 07:35 gshhs_f_l1_polytiled.dbf
-rw-r--r-- 1       1000    1000       143 Apr 12 07:35 gshhs_f_l1_polytiled.prj
-rw-r--r-- 1       1000    1000 164899800 Apr 12 07:35 gshhs_f_l1_polytiled.shp
-rw-r--r-- 1       1000    1000   1645108 Apr 12 07:35 gshhs_f_l1_polytiled.shx
-rw-r--r-- 1       1000    1000   1533390 Apr 12 07:35 gshhs_f_l2_polytiled.dbf
-rw-r--r-- 1       1000    1000       143 Apr 12 07:35 gshhs_f_l2_polytiled.prj
-rw-r--r-- 1       1000    1000  13770940 Apr 12 07:35 gshhs_f_l2_polytiled.shp
-rw-r--r-- 1       1000    1000     57148 Apr 12 07:35 gshhs_f_l2_polytiled.shx
-rw-r--r-- 1       1000    1000    306170 Apr 12 07:35 gshhs_f_l3_polytiled.dbf
-rw-r--r-- 1       1000    1000       143 Apr 12 07:35 gshhs_f_l3_polytiled.prj
-rw-r--r-- 1       1000    1000    961580 Apr 12 07:35 gshhs_f_l3_polytiled.shp
-rw-r--r-- 1       1000    1000     11484 Apr 12 07:35 gshhs_f_l3_polytiled.shx
-rw-r--r-- 1       1000    1000      2590 Apr 12 07:35 gshhs_f_l4_polytiled.dbf
-rw-r--r-- 1       1000    1000       143 Apr 12 07:35 gshhs_f_l4_polytiled.prj
-rw-r--r-- 1       1000    1000      9820 Apr 12 07:35 gshhs_f_l4_polytiled.shp
-rw-r--r-- 1       1000    1000       188 Apr 12 07:35 gshhs_f_l4_polytiled.shx

gebco ==> user piccontrol / group pic_run

[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 gshhs]$ cd /usr/local/conf/IPF_CFG/gebco/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 gebco]$ ls -al
total 1848814
drwxrwxrwx 2 piccontrol pic_run          6 Jun  5 22:53 .
drwxrwxr-x 1 piccontrol pic_run       4096 Jul 20 12:31 ..
-rw-rw-r-- 1 piccontrol pic_run 1866240612 Feb  8 14:31 gebco_08.nc
-rw-r--r-- 1 piccontrol pic_run        530 Jul 16 11:22 gebco_08.nc.aux.xml
-rw-rw-r-- 1 piccontrol pic_run      98097 Feb  8 14:31 gebco_08.pdf
-rw-rw-r-- 1 piccontrol pic_run   25647549 Feb  8 14:31 gebco_SID.zip
-rw-rw-r-- 1 piccontrol pic_run    1101242 Feb  8 14:31 gebco_docs.zip
-rw-rw-r-- 1 piccontrol pic_run      91440 Feb  8 14:31 gebco_terms_of_use.pdf

LUT ==> user piccontrol / group pic_run

[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 LUT]$ ls -al
total 4
drwxrwxrwx 4 piccontrol pic_run    2 Feb  8 15:49 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
drwxr-xr-x 2 piccontrol pic_run    9 Feb  8 14:38 GMF_cmod5n
drwxr-xr-x 2 piccontrol pic_run    9 Feb  8 14:39 GMF_cmod_ifr2nn
pcuq-ads commented 1 year ago

@w-fsi, To complete information provided by @suberti-ads , I add the SCDF properties for the execution worker :

deployer.execution-worker-high.kubernetes.podSecurityContext={runAsUser: 1046, fsGroup: 2000}
deployer.execution-worker-high.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
deployer.execution-worker-high.kubernetes.volumes=[{ name: 's3-upload-cache', emptyDir: { medium: 'Memory', sizeLimit: '1500Mi' }}, { name: dem, persistentVolumeClaim: { claimName: 's1-dem', storageClassName: 'ceph-fs' } }, { name: gebco, persistentVolumeClaim: { claimName: 's1-gebco', storageClassName: 'ceph-fs' } }, { name: gshhs, persistentVolumeClaim: { claimName: 's1-gshhs', storageClassName: 'ceph-fs' } }, { name: osm, persistentVolumeClaim: { claimName: 's1-osm', storageClassName: 'ceph-fs' } }, { name: extaux, persistentVolumeClaim: { claimName: 's1-l2external', storageClassName: 'ceph-fs' } }, { name: lut, persistentVolumeClaim: { claimName: 's1-lut', storageClassName: 'ceph-fs' } } ]

For S3 chains, runAsUser and fsGroup are set to 0.

w-fsi commented 1 year ago

@pcuq-ads : The user id 1000 doesn't seems to be right as this one does not exist for the container. This looks more like it was copied with a default user id from another system. However the read permission are set and this is likely not causing an issue as the files could be accessed. I think we can remove a possible permission issue from the list.

w-fsi commented 1 year ago

@pcuq-ads : Do we have a JO available of a failed request? In the bucket of the OPS there seems no failed requests. We would like to investigate how the JO actually looks like. The new processor version might handle mandatory-products in a different way.

pcuq-ads commented 1 year ago

@w-fsi : there is no failed JO. @suberti-ads will add JO with IW L2 processing. All of them have got the issue (processing OK but annotation is missing).

suberti-ads commented 1 year ago

@w-fsi , Sample for generation Input S1A_IW_RAW__0SDV_20230714T211656_20230714T211728_049424_05F17F_9980.SAFE hereafter job in mongo link to this execution: appJobdb108328.log Hereafter i extracted jobOrder for this product file from kafka message: PrettyJobOrder.txt

w-fsi commented 1 year ago

@pcuq-ads : Yeah, sorry, I meant one of the affected job orders. The idea we had was that maybe the annotations are not added to the JO and the behaviour of the IPF changed and ignoring it if it is not listed anymore. However as the job order shows it is added correctly there as well adn thus the IPF should process it.

We don't see any issues with the services there and thus tend to blame the IPF. However, we have two theories that might be further investigated: 1) The IPF is giving these python error messages and we are quite sure that these are somehow related to the static auxiliary data of the IPF, so either the OSM or DEM. The issue here is, that we are not aware about the structure that will be required on the mounts. If there is some additional folder the IPF might not be able to find the data as expected and if the error message is not precise, this can be very difficult to identify.

Screenshot_20230721_140540 Check for example the OSM. I find it very suspicious that this data seems to contain a version number. This caused various issues in the past already. Maybe the data on the shared disk was not updated together along with the IPF itself. The IPF identifies as version 3.61 and thus might be looking in this path instead of the 3.60 and thus not finding the data it is expecting. Maybe OPS shall give it a try to generate a symbolic link from a OSM_00361 to the OSM_00360. Other paths might be affected as well by something similiar.

2) The Tasktables are in my opinion very strange and I am not sure if it is supposed to be that way. E.g. when checking the IW that @suberti-ads is giving as example:

<?xml version='1.0' encoding='UTF-8'?>
<Ipf_Task_Table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="TaskTable_S1.xsd">
  <Processor_Name>IW_RAW__0_OCN__2</Processor_Name>
  <Version>3.61</Version>
  [...]
            <Input>
              <Mode>ALWAYS</Mode>
              <Mandatory>Yes</Mandatory>
              <List_of_Alternatives count="1">
                <Alternative>
                  <Order>1</Order>
                  <Origin>PROC</Origin>
                  <Retrieval_Mode>LatestValCover</Retrieval_Mode>
                  <T0>0.0</T0>
                  <T1>0.0</T1>
                  <File_Type>EW_OCN__2A</File_Type>
                  <File_Name_Type>Regexp</File_Name_Type>
                </Alternative>
              </List_of_Alternatives>
            </Input>
[...]

Notice that this is a IW tasktable that contains for the ProcEND a EW input. I am not to deep into the product model, but that's not what I expected. The JO does contain the EW as well and this might cause some confusion within the IPF. Notice that the previous task of the L0P_ProcMain does have as outputs the IW_OCN as I would expect it for the tasktable. I find it strange that the followup task is not using these outputs as inputs, but asking for EW_OCN. Maybe the ProcEND task is doing some extraction from the outputs to generate the annotation files. This might explain why the actual products are generated fine, but the annotations are missing and maybe also why it is working e.g. for EW_OCN as it is fine there.

I might be completely wrong here and there is some higher logic behind it. But I would use a working directory of a failed processing step and trying to modify the JO to use the right product types and execute it manually to see if the annotations are generated then correctly or to modify the TT and give it another full run.

pcuq-ads commented 1 year ago

@w-fsi: News for this issue. @nleconte-csgroup has found the product generated on the log. Product : S1A_IW_OCN__2ADV_20230714T211725_20230714T211759_049424_05F17F_C3BC.SAFE

{"header":{"type":"LOG","timestamp":"2023-07-14T22:46:58.950159Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-24-thread-1"},"message":{"content":"2023-07-14T22:46:58.949918 s1-l2-part3-execution-worker-low-v18-6f6684b748-zgjfp IW_RAW__0_OCN__2 3.61 [0000000808]: [I] (LOP_ProcMain) Generated product filename: /data/localWD/108331/S1A_IW_OCN__2SDV_20230714T211725_20230714T211759_049424_05F17F_9694.SAFE"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-07-14T22:46:58Z    {"header":{"type":"LOG","timestamp":"2023-07-14T22:46:58.214954Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-24-thread-1"},"message":{"content":"2023-07-14T22:46:58.214623 s1-l2-part3-execution-worker-low-v18-6f6684b748-zgjfp IW_RAW__0_OCN__2 3.61 [0000000808]: [I] (LOP_ProcMain) Generated product filename: /data/localWD/108331/S1A_IW_OCN__2ADV_20230714T211725_20230714T211759_049424_05F17F_C3BC.SAFE"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}

A test with the CFI generate the product IW L2 Annotation image

The annotation is not present on S3 bucket.

w-fsi commented 1 year ago

If you're having the working directory, can you please provide the list file and the associated EW log so we can see what exactly happened during the extraction. Most likely the entry was not in the list file.

pcuq-ads commented 1 year ago

Nicolas is about to push working directory on a bucket.

I watch the logs and I get two messages :

2023-07-14T23:02:54Z

{
  "header": {
    "type": "LOG",
    "timestamp": "2023-07-14T23:02:54.597471Z",
    "level": "INFO",
    "line": 133,
    "file": "Iterator.java",
    "thread": "pool-49-thread-1"
  },
  "message": {
    "content": "2023-07-14T23:02:54.597187 s1-l2-part3-execution-worker-low-v18-6f6684b748-zgjfp IW_RAW__0_OCN__2 3.61 [0000001468]: [I] (LOP_ProcMain) Generated product filename: /data/localWD/108333/S1A_IW_OCN__2ADV_20230714T201626_20230714T201656_049424_05F179_A45D.SAFE"
  },
  "custom": {
    "logger_string": "esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"
  }
}

2023-07-14T23:02:55Z

{
  "header": {
    "type": "LOG",
    "timestamp": "2023-07-14T23:02:55.879077Z",
    "level": "INFO",
    "line": 320,
    "file": "OutputProcessor.java",
    "thread": "KafkaConsumerDestination{consumerDestinationName='s1-l2-part3.priority-filter-low', partitions=30, dlqName='error-warning'}.container-0-C-1"
  },
  "message": {
    "content": "Output S1A_IW_OCN__2ADV_20230714T201626_20230714T201656_049424_05F179_A45D.SAFE will be ignored"
  },
  "custom": {
    "logger_string": "esa.s1pdgs.cpoc.ipf.execution.worker.job.file.OutputProcessor"
  }
}
w-jka commented 1 year ago

@pcuq-ads The log messages helped to finally narrow down what happens here. To fix this issue update the s1_l2 chain with the following parameters:

app.preparation-worker.worker.outputfamiliesstr=report:L2_REPORT||WV_OCN__2S:L2_SLICE||WV_OCN__2A:L2_ACN||IW_OCN__2S:L2_SLICE||IW_OCN__2A:L2_ACN||EW_OCN__2S:L2_SLICE||EW_OCN__2A:L2_ACN||SM_OCN__2S:L2_SLICE||SM_OCN__2A:L2_ACN
app.housekeep.worker.outputfamiliesstr=report:L2_REPORT||WV_OCN__2S:L2_SLICE||WV_OCN__2A:L2_ACN||IW_OCN__2S:L2_SLICE||IW_OCN__2A:L2_ACN||EW_OCN__2S:L2_SLICE||EW_OCN__2A:L2_ACN||SM_OCN__2S:L2_SLICE||SM_OCN__2A:L2_ACN

A space before the entry for the IW annotation product resulted in a wrong detection of product family.

pcuq-ads commented 1 year ago

Thank you @w-jka I create the pull request for OPS configuration https://github.com/COPRS/rs-config/pull/304

pcuq-ads commented 1 year ago

Werum_CCB_2023_w30 : configuration will be taken into account for next delivery.

w-fsi commented 1 year ago

Fix had been commited on develop branch.

Woljtek commented 1 year ago

Production of July 28th: We observed IW_OCN__2A products on PRIP: image.png The WA (develop image) worked

Woljtek commented 1 year ago

SYS_CCB_w31: This issue will be delivered with 1.4.0-rc2

vgava-ads commented 1 year ago

System_CCB_2023_w31: Delivered in the Processing Sentinel-1 v1.14.0 (Refer to https://github.com/COPRS/processing-sentinel-1/releases/tag/1.14.0-rc2) To be validated by IVV/OPS team.

LAQU156 commented 1 year ago

System_CCB_2023_w31 : Validated, closed