IBM / zopeneditor-about

IBM Z Open Editor: File issues here!
https://ibm.github.io/zopeneditor-about
Apache License 2.0
50 stars 20 forks source link

Latch Contention Issues and Dead TSO Address Spaces #445

Closed savaresejt closed 1 month ago

savaresejt commented 1 month ago

Development environment used

Problem Description

Detailed steps for reproducing the problem:

  1. First step

Observed behavior

User reported having issues saving datasets with zowe explorer. We saw in the log thousands of error messages trying to locate datasets in ++include and copybooks. We observed ~13,000 dead address spaces from the tso user. This has happened 3 times in the last month.

OUTPUT FROM D GRS,C,L

RESPONSE=SYSELMD                                                       
 ISG343I 08.08.35 GRS STATUS 871                                       
 LATCH SET NAME:  SYS.BPX.AP00.PRTB1.PPRA.LSN                          
 CREATOR JOBNAME: OMVS      CREATOR ASID: 0010                         
   LATCH NUMBER:  1                                                    
     REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME 
     USER1      0027  EXCLUSIVE  OWN       00AE41B0   Y   07:04:13.452 
     BPXOINIT   0041  EXCLUSIVE  WAIT      00AFAAE8   Y   07:04:13.450 
     USER1      0027  EXCLUSIVE  WAIT      00AE4800   Y   07:04:13.363 
     USER1      00E9  EXCLUSIVE  WAIT      00AE4800   Y   06:57:38.356 
     USER1      00AB  EXCLUSIVE  WAIT      00AE4800   Y   06:57:36.971 
     SSHD3      00F0  EXCLUSIVE  WAIT      00AD9DC8   Y   01:42:27.413 
     USER3      00F5  EXCLUSIVE  WAIT      00AFB2F8   Y   00:30:39.263 
     PORTMAP    0098  EXCLUSIVE  WAIT      00AF9040   Y   00:03:46.903 
   LATCH NUMBER:  47                                                   
     REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME 
     USER1      00AA  EXCLUSIVE  OWN       00AE41B0   Y   16:42:25.036 
     USER1      00AA  EXCLUSIVE  WAIT      00AE4800   Y   16:27:14.359 
   LATCH NUMBER:  125                                                  
     REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME 
     USER1      0099  EXCLUSIVE  OWN       00AE41B0   Y   15:40:55.436 
     USER1      0099  EXCLUSIVE  WAIT      00AE4800   Y   15:25:55.379 
 LATCH NUMBER:  260                                                    
   REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME   
   USER3      00F5  EXCLUSIVE  OWN       00AFB2F8   Y   00:30:39.263   
   RSED3      0051  EXCLUSIVE  WAIT      00AC80A0   Y   00:30:36.665   
   RSED3      0051  EXCLUSIVE  WAIT      00AC1E88   Y   00:27:39.311   
 LATCH NUMBER:  1505                                                   
   REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME   
   USER2      00D5  EXCLUSIVE  OWN       00AE41B0   Y   -over 24 hrs   
   USER2      00D5  EXCLUSIVE  WAIT      00AE4800   Y   -over 24 hrs   
 LATCH NUMBER:  1654                                                   
   REQUESTOR  ASID  EXC/SHR    OWN/WAIT  WORKUNIT  TCB  ELAPSED TIME   
   SSHD3      00F0  EXCLUSIVE  OWN       00AD9DC8   Y   01:42:27.413   
   SSHD4      0066  EXCLUSIVE  WAIT      00AFB2F8   Y   01:40:26.298   
   SSHD5      0121  EXCLUSIVE  WAIT      00AFB2F8   Y   00:17:57.800     

These commands recovered us

FORCE U=USER1,A=00AA,TCB=AE4800 
FORCE U=USER1,A=0099,TCB=AE4800
FORCE U=USER1,A=0099,TCB=AE4800
FORCE U=USER1,A=0099,TCB=AE4800
FORCE U=USER2,A=00d5,TCB=AE4800

Expected behavior

We would still like to figure out the root cause here, this could cause issues if we move zopen editor up to production if we're getting system latches.

The zapp file users are using is:

name: Generated ZAPP document
description: >-
  Configuration file that controls where z-open editor will search for 
  COBOL copybooks so that they can be traced through the code by hovering
  Documentation:
  https://ibm.github.io/zopeneditor-about/Docs/setting_propertygroup.html
author: IBM Z Open Editor
propertyGroups:
  - name: search-all
    libraries:
      - name: syslib
        type: mvs
        locations:
          - XXX.DEV.COPYLIB
          - XXX.D.COPYLIB
          - XXX.S.COPYLIB
          - XXX.P.COPYLIB
          - XXX.D.DCLGEN          
          - XXX.S.DCLGEN           
          - XXX.P.DCLGEN     
          - XXX.P.DCLGEN
          - XXX.D.MAPLIB
          - XXX.S.MAPLIB
          - XXX.P.MAPLIB   
          - CICS.V6R1.CPSM.SEYUCOB     
          - CICS.V6R1.SDFHCOB
          - DB2.CADB2.V20R0M0.CDBACOBI
          - YYY.RAI.CRAICOBI
          - XXX.DEV.SOURCE
          - XXX.D.SOURCE 
          - XXX.S.SOURCE
          - XXX.P.SOURCE 

-

phaumer commented 1 month ago

This is a problem with z/OSMF. We created various user settings to control this behavior, such as listBeforeDownload to avoid z/OSMF logging errors when files could not be found or maximumParallelFileDownloads to reduce the number of parallel download attempts, as each request might allocate a new address space. We recommend using IBM Remote System Explorer instead of z/OSMF.

See more details in https://ibm.github.io/zopeneditor-about/Docs/interact_zos_zopeneditor.html

savaresejt commented 1 month ago

Without switching to RSE could we set the listBeforeDownload to true and set maximumParralellDownloads to a low number to avoid this issue?

phaumer commented 1 month ago

I hope so. z/OSMF should return those address spaces automatically as well. If that does not happen it might be worse a ticket against the z/OSMF teams.

phaumer commented 1 month ago

Btw. I should have mentioned this doc page: https://ibm.github.io/zopeneditor-about/Docs/knownissues.html#using-z-osmf-with-z-open-editor-and-zowe-explorer that also links to this blog post. Let me know if this helps.

savaresejt commented 1 month ago

Thanks, this helps a lot! We will take a look at these settings and adjust them if needed.

Do you think the latch contention may have resulted from too many address spaces being spun up and contention in that area?