Closed pcuq-ads closed 1 year ago
Can you please provide the logs of the execution worker of a related production, so we can have a look if there are any indications. As the execution workers are down, it is not possible to analyze this at the moment. At least the metadata extraction are not having any indication that these kind of products was there.
Having the annotations, but not the slices is something I didn't observe in the past. My guts feeling is that there might be some hint in the IPF logs.
IVV_CCB_2023_w19 : Moved into "Accepted Werum", Priority blocking, to be fixed phase 1
Werum_CCB_2023_w19 : Moved into "Product Backlog" for further analysis
I check JobProcessing end trace end there is no annotation mentioned on the trace for IW_OCN.
{
"header": {
"type": "REPORT",
"timestamp": "2023-04-29T08:17:59.760000Z",
"level": "INFO",
"mission": "S1",
"workflow": "NOMINAL",
"rs_chain_name": "S1-L2",
"rs_chain_version": "1.12.0-rc1"
},
"message": {
"content": "End job processing"
},
"task": {
"uid": "400ea3fd-d715-4f7d-abb3-831b12077fff",
"name": "JobProcessing",
"event": "END",
"status": "OK",
"output": {
"debug": false,
"t0_pdgs_date": "2023-04-27T17:27:31.000000Z",
"filename_string": "S1A_IW_OCN__2SDV_20230427T172255_20230427T172320_048285_05CE7B_5118.SAFE"
},
"input": {
"job_order_id_string": "JobOrder.82730.xml",
"ipf_release_string": "3.61",
For S1A_EW_OCN I check that output mention both slice and annotation on the traces.
Hello @w-fsi I am not sure if this will solve this issue, but I think we need to change something in the configuration. I see that
app.preparation-worker.process.outputregexps.WV_OCN__2S=^S1[A-B]_S[1-6]_OCN__2S.*$
app.preparation-worker.process.outputregexps.WV_OCN__2A=^S1[A-B]_S[1-6]_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2A=^S1[A-B]_S[1-6]_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2S=^S1[A-B]_S[1-6]_OCN__2S.*$
I guess, we should change it with :
app.preparation-worker.process.outputregexps.WV_OCN__2S=^S1[A-B]_WV_OCN__2S.*$
app.preparation-worker.process.outputregexps.WV_OCN__2A=^S1[A-B]_WV_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2A=^S1[A-B]_S[1-6]_OCN__2A.*$
app.preparation-worker.process.outputregexps.SM_OCN__2S=^S1[A-B]_S[1-6]_OCN__2S.*$
Or maybe I miss something ? Can you confirm please ?
Regards
@pcuq-ads
Thank you for your observations. We checked the configuration and the tasktable and while this line of configuration is indeed misleading, it has no impact on the production on neither the IW products nor the WV products as the outputs are not declared as Regexp
in the tasktable. The configuration was used from the previous project S1PRO as well, where no issues regarding this are to our knowledge.
One thing to note is, that the 0S outputs are declared as mandatory in the tasktable, while this is not the case for the 0A outputs. Could you check, tat the production service is producing 0A for these slices?
If yes, we need the execution-worker logs for the production of the 0S files, as the 0A would have been produced by the IPF with the same joborder.
One additional remark, unrelated to this issue: The tasktable SM_RAW__0_OCN__2.xml
changed since the version used for S1PRO, especially in the configuration of the outputs. In the newer tasktable the outputs are not declared as Regexp
which will lead to problems with the L2 production of the S[1-6] products with high certainty. As the tasktables are having some other indications of copy-paste errors, it might be, that this is also the result of a manual error on the side of the IPF provider.
@w-jka We are facing a systematic issue. On production service, I cannot find any slice L2 without its annotation. Here is the production during the 24 hours day. https://app.zenhub.com/files/398313496/ad8ae510-2db3-4e28-9d0e-8ee89997e7ea/download
On RS, I could not find any IW L2 annotation when I get IW L2 slice. For example, for slices :
@pcuq-ads Thank you for checking the results of the production service. Can you please provide the execution worker logs for one of those jobs (producing a OCN__2S product without its corresponding annotation product), so we can check what causes this behaviour?
Here are some logs. https://app.zenhub.com/files/398313496/31fd85c0-cea6-4f4f-808d-cf0d704e694c/download
I cannot catch a big time frame. I hope I have catch the right area.
@pcuq-ads
From the extract you provided I could identify this passage that gives some insight into what might be the root cause of this issue:
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675492Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.675030 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] AttributeError: 'NoneType' object has no attribute 'GetLayer'"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675467Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.675017 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"src/cls/osgeo/raster/get_mask.py\", line 121, in cls.osgeo.raster.get_mask.EarthMask.getMask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675452Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.675004 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"src/cls/data/sar/l1/swath.py\", line 575, in cls.data.sar.l1.swath.Swath._get_earth_mask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675435Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674991 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"src/cls/data/sar/l1/swath.py\", line 545, in cls.data.sar.l1.swath.Swath._get_earth_mask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675419Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674978 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"src/cls/data/sar/l1/sar_dataset.py\", line 859, in cls.data.sar.l1.sar_dataset.SarDataset.get_earth_mask"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675398Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674964 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] mask = ds.get_earth_mask(0, 0, ds.xsize // nx * nx, ds.ysize // ny * ny,"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675382Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674951 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"/usr/local/components/S1IPF/lib/python3.9/site-packages/cls/sentinel1/lop_rvl_application.py\", line 763, in make_iw_rvl"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675362Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674938 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] self.make_iw_rvl(subswath)"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675340Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674924 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"/usr/local/components/S1IPF/lib/python3.9/site-packages/cls/sentinel1/lop_rvl_application.py\", line 210, in __init__"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675319Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674908 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] LopRvlApplication(st, subswath)"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675292Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674890 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] File \"/usr/local/components/S1IPF/lib/python3.9/site-packages/cls/sentinel1/lop_rvl_application.py\", line 63, in rvl_application_worker"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675272Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.674846 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] Traceback (most recent call last):"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-05-20T02:06:20Z {"header":{"type":"LOG","timestamp":"2023-05-20T02:06:20.675194Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-276-thread-1"},"message":{"content":"2023-05-20T02:06:20.673998 s1-l2-part3-execution-worker-low-v11-6f9f4cfc84-5x72p IW_RAW__0_OCN__2 3.61 [0000007998]: [E] (LOP_ProcMain) SENTINEL_LOP_STATUS=0"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
The IPF itself runs into an error and gives an error code 127 (warning, INCOMPLETE based on the documentation). The error is related to a mask that is provided. As this error occurs in the execution worker and not the preparation-worker, the related mask is not provided by configuration, but I would assume, it is part of the DEM. The related error message here is AttributeError: 'NoneType' object has no attribute 'GetLayer'
So from the currently available information, the root cause is in the DEM files provided to the IPF.
I am not sure if it is related to the DEM files as the error sounds more like some kind of mask is loaded. I assume that this is the counterpart of the Prep worker mask (that tries to filter out before it reaches the EW). The IPF seems to have some kind of check on its own. There is the RPM "S1PD-IPF-OSM_3.6.0-3.6.0-00.01.x86_64.rpm" that sounds like it contains some "polygones" and landscape information. Maybe it was not installed?
However this issue is clearly related to the IPF and not caused by EW. The IPF tries to load something that is not found.
@w-fsi Please find below the starting pods log in order to investigate if the masks are well-loaded for S1-L1 and S1-L2 (updated by @pcuq-ads )
https://app.zenhub.com/files/398313496/2ee55f05-563d-4bba-bfbb-b83763d710bb/download
Werum_CCB_2023_w22 : Action on @OPS team to check if the RPM (S1PD-IPF-OSM_3.6.0-3.6.0-00.01.x86_64.rpm) is well installed in the docker image.
@SYTHIER-ADS has found discrepancies on volume mount between S1-L1 and S1-L2 configuration This pull request may fix part or all of this issue.
Werum_CCB_2023_w23 : The change provided in the pull request in rs config repository has to be retrieved to s1 repository
Default configuration was updated accordingly. This issue will be fixed in 1.13.2 and thus be contained in V2.
The issue remains with version 1.13.2 of S1-L2 RS core add-on. Request attr_productType_string:IW_OCN__2* on the ES :
IVV_CCB_2023_w29 : fix failed with release 1.13.2 . To be moved to WERUM board.with version
Werum_CCB_2023_w29: Accepted, to be analyzed.
@w-fsi
I check that the configuration deployed in OPS is the one from RS_CONFIG/OPS (check on SCDF configuration).
Something is strange to me. I see two syntaxes for volume mount propery:
sometimes it is volumeMounts
deployer.housekeep.kubernetes.volumeMounts=[{name: tasktables, mountPath: '/app/tasktables'},{name: maskfile, mountPath: '/shared/sentinel1/masks'}]
deployer.preparation-worker.kubernetes.volumeMounts=[{name: tasktables, mountPath: '/app/tasktables'},{name: maskfile, mountPath: '/shared/sentinel1/masks'}]
sometimes it is volume-mounts
deployer.execution-worker-high.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
deployer.execution-worker-medium.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
deployer.execution-worker-low.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
Do you thing this can be the ROOT CAUSE of this issue ?
Regards
@pcuq-ads: No, I don't think that this is an issue. Both naming pattern should be accepted by SCDF and I assume that they are applied as well. S2 configurations are using a different one than our configurations, so I assume it is mixed up by this somehow.
This issue is still really strange to me. I checked the TT and the annotations are non-mandatory in there. However the concept of enforcing non-mandatory outputs does not exist for the S1 and thus either the issue exists since the beginning or the last IPF upgrades changed the behaviour somehow. The python library logs still makes me think that this is somehow related to a configuration that is loaded.
Maybe we should schedule a debug session with obs, to verify a few thinks. EW seems to be scheduled to 0 at the moment, so @Woljtek do you see a change to construct this issue somehow, so we can have a look at it together?
@w-fsi, You are right, we checked that the /mount are well defined for S1-L2.
Back to square one.
@pcuq-ads : Did you also check the permissions within the mounts? We changed the default user for the packages and maybe they are not in line with the current settings. A permission issue might explain the observed behaviour as well.
As pointed out in the previous comment... if the mounts are well, we are in the foggy land again.
Here are full log related to an execution worker running an IW product. https://app.zenhub.com/files/398313496/d62d2723-1ae1-4c68-9739-715c264c65f1/download
@w-fsi We check right on mounting point. We not seen same right on each folder: dem ==> user piccontrol/ group pic_run
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 app]$ cd /usr/local/conf/IPF_CFG/dem
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 dem]$ ls -al
total 4
drwxrwxrwx 3 piccontrol pic_run 1 Feb 8 11:41 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
drwxr-xr-x 3 piccontrol pic_run 1 Feb 8 11:41 fine
l2ExternalAuxFiles ==> piccontrol user/ group pic_run
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 dem]$ cd /usr/local/conf/IPF_CFG/l2ExternalAuxFiles/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 l2ExternalAuxFiles]$ ls -al
total 2149437
drwxrwxrwx 2 piccontrol pic_run 52 Feb 8 15:48 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is1-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is1-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is1-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is1-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is2-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is2-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is2-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is2-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is3-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is3-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is3-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is3-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is4-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is4-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is4-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is4-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is5-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is5-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is5-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is5-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is6-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is6-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is6-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is6-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is7-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is7-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133365 Feb 8 14:44 asa-aux-scs-is7-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 asa-aux-scs-is7-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is1-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is1-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is1-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is1-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is2-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is2-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is2-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is2-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is3-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is3-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is3-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is3-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is4-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is4-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is4-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is4-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is5-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is5-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is5-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is5-vv.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is6-hh.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is6-hh.nlin
-rw-rw-r-- 1 piccontrol pic_run 133366 Feb 8 14:44 s1--aux-scs-is6-vv.info
-rw-rw-r-- 1 piccontrol pic_run 84520800 Feb 8 14:44 s1--aux-scs-is6-vv.nlin
osm ==> root user / group root then user UID 1000/ 1000 in read only
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 osm]$ ls -al
total 4
drwxrwxrwx 3 root root 1 Apr 12 07:36 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
drwxr-xr-x 2 root root 11 Apr 12 07:35 OSM_00360
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 osm]$ cd OSM_00360/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 OSM_00360]$ ls -al
total 1159535
drwxr-xr-x 2 root root 11 Apr 12 07:35 .
drwxrwxrwx 3 root root 1 Apr 12 07:36 ..
-rw-r--r-- 1 1000 1000 824 Apr 12 07:35 README.txt
-rw-r--r-- 1 1000 1000 5 Apr 12 07:35 lakes.cpg
-rw-r--r-- 1 1000 1000 880 Apr 12 07:35 lakes.dbf
-rw-r--r-- 1 1000 1000 145 Apr 12 07:35 lakes.prj
-rw-r--r-- 1 1000 1000 3885808 Apr 12 07:35 lakes.shp
-rw-r--r-- 1 1000 1000 220 Apr 12 07:35 lakes.shx
-rw-r--r-- 1 1000 1000 6 Apr 12 07:35 land_polygons.cpg
-rw-r--r-- 1 1000 1000 14426437 Apr 12 07:35 land_polygons.dbf
-rw-r--r-- 1 1000 1000 145 Apr 12 07:35 land_polygons.prj
-rw-r--r-- 1 1000 1000 1162971504 Apr 12 07:35 land_polygons.shp
-rw-r--r-- 1 1000 1000 6074348 Apr 12 07:35 land_polygons.shx
gshhs ==> user piccontrol / group pic_run with file with 1000/1000 UID/GID
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 osm]$ cd /usr/local/conf/IPF_CFG/gshhs/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 gshhs]$ ls -al
total 222088
drwxrwxrwx 2 piccontrol pic_run 16 Apr 12 07:36 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
-rw-r--r-- 1 1000 1000 44209815 Apr 12 07:35 gshhs_f_l1_polytiled.dbf
-rw-r--r-- 1 1000 1000 143 Apr 12 07:35 gshhs_f_l1_polytiled.prj
-rw-r--r-- 1 1000 1000 164899800 Apr 12 07:35 gshhs_f_l1_polytiled.shp
-rw-r--r-- 1 1000 1000 1645108 Apr 12 07:35 gshhs_f_l1_polytiled.shx
-rw-r--r-- 1 1000 1000 1533390 Apr 12 07:35 gshhs_f_l2_polytiled.dbf
-rw-r--r-- 1 1000 1000 143 Apr 12 07:35 gshhs_f_l2_polytiled.prj
-rw-r--r-- 1 1000 1000 13770940 Apr 12 07:35 gshhs_f_l2_polytiled.shp
-rw-r--r-- 1 1000 1000 57148 Apr 12 07:35 gshhs_f_l2_polytiled.shx
-rw-r--r-- 1 1000 1000 306170 Apr 12 07:35 gshhs_f_l3_polytiled.dbf
-rw-r--r-- 1 1000 1000 143 Apr 12 07:35 gshhs_f_l3_polytiled.prj
-rw-r--r-- 1 1000 1000 961580 Apr 12 07:35 gshhs_f_l3_polytiled.shp
-rw-r--r-- 1 1000 1000 11484 Apr 12 07:35 gshhs_f_l3_polytiled.shx
-rw-r--r-- 1 1000 1000 2590 Apr 12 07:35 gshhs_f_l4_polytiled.dbf
-rw-r--r-- 1 1000 1000 143 Apr 12 07:35 gshhs_f_l4_polytiled.prj
-rw-r--r-- 1 1000 1000 9820 Apr 12 07:35 gshhs_f_l4_polytiled.shp
-rw-r--r-- 1 1000 1000 188 Apr 12 07:35 gshhs_f_l4_polytiled.shx
gebco ==> user piccontrol / group pic_run
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 gshhs]$ cd /usr/local/conf/IPF_CFG/gebco/
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 gebco]$ ls -al
total 1848814
drwxrwxrwx 2 piccontrol pic_run 6 Jun 5 22:53 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
-rw-rw-r-- 1 piccontrol pic_run 1866240612 Feb 8 14:31 gebco_08.nc
-rw-r--r-- 1 piccontrol pic_run 530 Jul 16 11:22 gebco_08.nc.aux.xml
-rw-rw-r-- 1 piccontrol pic_run 98097 Feb 8 14:31 gebco_08.pdf
-rw-rw-r-- 1 piccontrol pic_run 25647549 Feb 8 14:31 gebco_SID.zip
-rw-rw-r-- 1 piccontrol pic_run 1101242 Feb 8 14:31 gebco_docs.zip
-rw-rw-r-- 1 piccontrol pic_run 91440 Feb 8 14:31 gebco_terms_of_use.pdf
LUT ==> user piccontrol / group pic_run
[piccontrol@s1-l2-part2-execution-worker-medium-v18-774f76c66d-kbhw6 LUT]$ ls -al
total 4
drwxrwxrwx 4 piccontrol pic_run 2 Feb 8 15:49 .
drwxrwxr-x 1 piccontrol pic_run 4096 Jul 20 12:31 ..
drwxr-xr-x 2 piccontrol pic_run 9 Feb 8 14:38 GMF_cmod5n
drwxr-xr-x 2 piccontrol pic_run 9 Feb 8 14:39 GMF_cmod_ifr2nn
@w-fsi, To complete information provided by @suberti-ads , I add the SCDF properties for the execution worker :
deployer.execution-worker-high.kubernetes.podSecurityContext={runAsUser: 1046, fsGroup: 2000}
deployer.execution-worker-high.kubernetes.volume-mounts=[{ name: 's3-upload-cache', mountPath: '/opt/s3/uploadCache'}, { name: dem, mountPath: '/usr/local/conf/IPF_CFG/dem' }, { name: gebco, mountPath: '/usr/local/conf/IPF_CFG/gebco' }, { name: gshhs, mountPath: '/usr/local/conf/IPF_CFG/gshhs' }, { name: osm, mountPath: '/usr/local/conf/IPF_CFG/osm' }, { name: extaux, mountPath: '/usr/local/conf/IPF_CFG/l2ExternalAuxFiles' }, { name: lut, mountPath: '/usr/local/conf/IPF_CFG/LUT' } ]
deployer.execution-worker-high.kubernetes.volumes=[{ name: 's3-upload-cache', emptyDir: { medium: 'Memory', sizeLimit: '1500Mi' }}, { name: dem, persistentVolumeClaim: { claimName: 's1-dem', storageClassName: 'ceph-fs' } }, { name: gebco, persistentVolumeClaim: { claimName: 's1-gebco', storageClassName: 'ceph-fs' } }, { name: gshhs, persistentVolumeClaim: { claimName: 's1-gshhs', storageClassName: 'ceph-fs' } }, { name: osm, persistentVolumeClaim: { claimName: 's1-osm', storageClassName: 'ceph-fs' } }, { name: extaux, persistentVolumeClaim: { claimName: 's1-l2external', storageClassName: 'ceph-fs' } }, { name: lut, persistentVolumeClaim: { claimName: 's1-lut', storageClassName: 'ceph-fs' } } ]
For S3 chains, runAsUser
and fsGroup
are set to 0.
@pcuq-ads : The user id 1000 doesn't seems to be right as this one does not exist for the container. This looks more like it was copied with a default user id from another system. However the read permission are set and this is likely not causing an issue as the files could be accessed. I think we can remove a possible permission issue from the list.
@pcuq-ads : Do we have a JO available of a failed request? In the bucket of the OPS there seems no failed requests. We would like to investigate how the JO actually looks like. The new processor version might handle mandatory-products in a different way.
@w-fsi : there is no failed JO. @suberti-ads will add JO with IW L2 processing. All of them have got the issue (processing OK but annotation is missing).
@w-fsi , Sample for generation Input S1A_IW_RAW__0SDV_20230714T211656_20230714T211728_049424_05F17F_9980.SAFE hereafter job in mongo link to this execution: appJobdb108328.log Hereafter i extracted jobOrder for this product file from kafka message: PrettyJobOrder.txt
@pcuq-ads : Yeah, sorry, I meant one of the affected job orders. The idea we had was that maybe the annotations are not added to the JO and the behaviour of the IPF changed and ignoring it if it is not listed anymore. However as the job order shows it is added correctly there as well adn thus the IPF should process it.
We don't see any issues with the services there and thus tend to blame the IPF. However, we have two theories that might be further investigated: 1) The IPF is giving these python error messages and we are quite sure that these are somehow related to the static auxiliary data of the IPF, so either the OSM or DEM. The issue here is, that we are not aware about the structure that will be required on the mounts. If there is some additional folder the IPF might not be able to find the data as expected and if the error message is not precise, this can be very difficult to identify.
Check for example the OSM. I find it very suspicious that this data seems to contain a version number. This caused various issues in the past already. Maybe the data on the shared disk was not updated together along with the IPF itself. The IPF identifies as version 3.61 and thus might be looking in this path instead of the 3.60 and thus not finding the data it is expecting. Maybe OPS shall give it a try to generate a symbolic link from a OSM_00361 to the OSM_00360. Other paths might be affected as well by something similiar.
2) The Tasktables are in my opinion very strange and I am not sure if it is supposed to be that way. E.g. when checking the IW that @suberti-ads is giving as example:
<?xml version='1.0' encoding='UTF-8'?>
<Ipf_Task_Table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="TaskTable_S1.xsd">
<Processor_Name>IW_RAW__0_OCN__2</Processor_Name>
<Version>3.61</Version>
[...]
<Input>
<Mode>ALWAYS</Mode>
<Mandatory>Yes</Mandatory>
<List_of_Alternatives count="1">
<Alternative>
<Order>1</Order>
<Origin>PROC</Origin>
<Retrieval_Mode>LatestValCover</Retrieval_Mode>
<T0>0.0</T0>
<T1>0.0</T1>
<File_Type>EW_OCN__2A</File_Type>
<File_Name_Type>Regexp</File_Name_Type>
</Alternative>
</List_of_Alternatives>
</Input>
[...]
Notice that this is a IW tasktable that contains for the ProcEND a EW input. I am not to deep into the product model, but that's not what I expected. The JO does contain the EW as well and this might cause some confusion within the IPF. Notice that the previous task of the L0P_ProcMain does have as outputs the IW_OCN as I would expect it for the tasktable. I find it strange that the followup task is not using these outputs as inputs, but asking for EW_OCN. Maybe the ProcEND task is doing some extraction from the outputs to generate the annotation files. This might explain why the actual products are generated fine, but the annotations are missing and maybe also why it is working e.g. for EW_OCN as it is fine there.
I might be completely wrong here and there is some higher logic behind it. But I would use a working directory of a failed processing step and trying to modify the JO to use the right product types and execute it manually to see if the annotations are generated then correctly or to modify the TT and give it another full run.
@w-fsi: News for this issue. @nleconte-csgroup has found the product generated on the log. Product : S1A_IW_OCN__2ADV_20230714T211725_20230714T211759_049424_05F17F_C3BC.SAFE
{"header":{"type":"LOG","timestamp":"2023-07-14T22:46:58.950159Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-24-thread-1"},"message":{"content":"2023-07-14T22:46:58.949918 s1-l2-part3-execution-worker-low-v18-6f6684b748-zgjfp IW_RAW__0_OCN__2 3.61 [0000000808]: [I] (LOP_ProcMain) Generated product filename: /data/localWD/108331/S1A_IW_OCN__2SDV_20230714T211725_20230714T211759_049424_05F17F_9694.SAFE"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
2023-07-14T22:46:58Z {"header":{"type":"LOG","timestamp":"2023-07-14T22:46:58.214954Z","level":"INFO","line":133,"file":"Iterator.java","thread":"pool-24-thread-1"},"message":{"content":"2023-07-14T22:46:58.214623 s1-l2-part3-execution-worker-low-v18-6f6684b748-zgjfp IW_RAW__0_OCN__2 3.61 [0000000808]: [I] (LOP_ProcMain) Generated product filename: /data/localWD/108331/S1A_IW_OCN__2ADV_20230714T211725_20230714T211759_049424_05F17F_C3BC.SAFE"},"custom":{"logger_string":"esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"}}
A test with the CFI generate the product IW L2 Annotation
The annotation is not present on S3 bucket.
If you're having the working directory, can you please provide the list file and the associated EW log so we can see what exactly happened during the extraction. Most likely the entry was not in the list file.
Nicolas is about to push working directory on a bucket.
I watch the logs and I get two messages :
2023-07-14T23:02:54Z
{
"header": {
"type": "LOG",
"timestamp": "2023-07-14T23:02:54.597471Z",
"level": "INFO",
"line": 133,
"file": "Iterator.java",
"thread": "pool-49-thread-1"
},
"message": {
"content": "2023-07-14T23:02:54.597187 s1-l2-part3-execution-worker-low-v18-6f6684b748-zgjfp IW_RAW__0_OCN__2 3.61 [0000001468]: [I] (LOP_ProcMain) Generated product filename: /data/localWD/108333/S1A_IW_OCN__2ADV_20230714T201626_20230714T201656_049424_05F179_A45D.SAFE"
},
"custom": {
"logger_string": "esa.s1pdgs.cpoc.ipf.execution.worker.job.process.TaskCallable"
}
}
2023-07-14T23:02:55Z
{
"header": {
"type": "LOG",
"timestamp": "2023-07-14T23:02:55.879077Z",
"level": "INFO",
"line": 320,
"file": "OutputProcessor.java",
"thread": "KafkaConsumerDestination{consumerDestinationName='s1-l2-part3.priority-filter-low', partitions=30, dlqName='error-warning'}.container-0-C-1"
},
"message": {
"content": "Output S1A_IW_OCN__2ADV_20230714T201626_20230714T201656_049424_05F179_A45D.SAFE will be ignored"
},
"custom": {
"logger_string": "esa.s1pdgs.cpoc.ipf.execution.worker.job.file.OutputProcessor"
}
}
@pcuq-ads The log messages helped to finally narrow down what happens here. To fix this issue update the s1_l2 chain with the following parameters:
app.preparation-worker.worker.outputfamiliesstr=report:L2_REPORT||WV_OCN__2S:L2_SLICE||WV_OCN__2A:L2_ACN||IW_OCN__2S:L2_SLICE||IW_OCN__2A:L2_ACN||EW_OCN__2S:L2_SLICE||EW_OCN__2A:L2_ACN||SM_OCN__2S:L2_SLICE||SM_OCN__2A:L2_ACN
app.housekeep.worker.outputfamiliesstr=report:L2_REPORT||WV_OCN__2S:L2_SLICE||WV_OCN__2A:L2_ACN||IW_OCN__2S:L2_SLICE||IW_OCN__2A:L2_ACN||EW_OCN__2S:L2_SLICE||EW_OCN__2A:L2_ACN||SM_OCN__2S:L2_SLICE||SM_OCN__2A:L2_ACN
A space before the entry for the IW annotation product resulted in a wrong detection of product family.
Thank you @w-jka I create the pull request for OPS configuration https://github.com/COPRS/rs-config/pull/304
Werum_CCB_2023_w30 : configuration will be taken into account for next delivery.
Fix had been commited on develop branch.
Production of July 28th: We observed IW_OCN__2A products on PRIP: The WA (develop image) worked
SYS_CCB_w31: This issue will be delivered with 1.4.0-rc2
System_CCB_2023_w31: Delivered in the Processing Sentinel-1 v1.14.0 (Refer to https://github.com/COPRS/processing-sentinel-1/releases/tag/1.14.0-rc2) To be validated by IVV/OPS team.
System_CCB_2023_w31 : Validated, closed
Environment:
Traceability:
Current Behavior: During the test 24H, we observe that product IW_OCN__2A is not generated whereas IW_OCN__2S is generated.
Expected Behavior: Any slice level-2 shall be generated with its annotation.
Steps To Reproduce: Check production.
Test execution artefacts (i.e. logs, screenshots…) Check PRIP index, with
We see the issue.
Check PRIP, with
It is OK for EW_OCN.
Whenever possible, first analysis of the root cause Hypothesis : 1) Annotation not sent to catalog ?? 2) Annotation not generated ??
Bug Generic Definition of Ready (DoR)
Bug Generic Definition of Done (DoD)