IBM / dbb-zappbuild

zAppBuild is a generic build solution for building z/OS applications using Apache Groovy build scripts and IBM Dependency Based Build (DBB) APIs.
Apache License 2.0
40 stars 124 forks source link

[DBB V2] DB2 SQL Error: SQLCODE=-913, SQLSTATE=57033, SQLERRMC=00C9008E;00000210;DBBZ001 .DBBRSEQR.00000001, DRIVER=4.31.25 #405

Closed FALLAI-Denis closed 10 months ago

FALLAI-Denis commented 10 months ago

Hi,

Since the switch to DBB V2 and the management of the metadastore in Db2 z/OS, we are confronted with abend SQLCODE -913 / SQLERRMC 00C9008E on theDBBRSEQR tablespace which corresponds to the DBB_SEQ_TABLE table.

This happens when several builds are launched in parallel, and concerns builds launched after the 1st build.

The DBB_SEQ_TABLE table consists of a single row which appears to be a build counter.

We are at level of fix UI91246 / PH53686.

==> Could this problem be related to our adaptation of the build.groovy script?

The relevant statement in our adapted build.groovy script is:

def buildResult = metadataStore.createBuildResult(props.applicationBuildGroup, props.applicationBuildLabel)

Should a particular action be taken after createBuildResult to free the resource linked to table DBB_SEQ_TABLE?

Detail of the problem:

11:55:41  com.ibm.dbb.build.BuildException: com.ibm.db2.jcc.am.SqlException: DB2 SQL Error: SQLCODE=-913, SQLSTATE=57033, SQLERRMC=00C9008E;00000210;DBBZ001 .DBBRSEQR.00000001, DRIVER=4.31.25
11:55:41       at com.ibm.dbb.metadata.jdbc.JDBCMetadataStore.getNextId(JDBCMetadataStore.java:791)
11:55:41       at com.ibm.dbb.metadata.jdbc.JDBCMetadataStore.getNextId(JDBCMetadataStore.java:751)
11:55:41       at com.ibm.dbb.metadata.jdbc.JDBCMetadataStore.createBuildResult(JDBCMetadataStore.java:838)
11:55:41       at com.ibm.dbb.metadata.jdbc.JDBCMetadataStore.createBuildResult(JDBCMetadataStore.java:819)
11:55:41       at java.lang.invoke.InterfaceHandle.invokeExact_thunkArchetype_L(InterfaceHandle.java:134)
11:55:41       at java.lang.invoke.CatchHandle.invokeExact_thunkArchetype_X(CatchHandle.java:76)
11:55:41       at java.lang.invoke.GuardWithTestHandle.invokeExact_thunkArchetype_X(GuardWithTestHandle.java:80)
11:55:41       at java.lang.invoke.GuardWithTestHandle.invokeExact_thunkArchetype_X(GuardWithTestHandle.java:80)
11:55:41       at java.lang.invoke.SpreadHandle.invokeExact_thunkArchetype_X(SpreadHandle.java:100)
11:55:41       at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:318)
11:55:41       at java.lang.invoke.DirectHandle.invokeExact_thunkArchetype_L(DirectHandle.java:302)
11:55:41       at java.lang.invoke.AsTypeHandle.invokeExact_thunkArchetype_X(AsTypeHandle.java:49)
11:55:41       at java.lang.invoke.BruteArgumentMoverHandle.invokeExact_thunkArchetype_X(BruteArgumentMoverHandle.java:404)
11:55:41       at java.lang.invoke.CollectHandle.invokeExact_thunkArchetype_X(CollectHandle.java:130)
11:55:41       at java.lang.invoke.MutableCallSiteDynamicInvokerHandle.invokeExact_thunkArchetype_X(MutableCallSiteDynamicInvokerHandle.java:64)
11:55:41       at buildBPCE.initializeBuildProcess(buildBPCE.groovy:314)

And at the level of the Db2 master stc:

11.55.38 STC05917 ---- MONDAY,    28 AUG 2023 ----                                                     
11.55.38 STC05917  DSNT376I  -DB2$ PLAN=DISTSERV WITH  701                                             
   701                     CORRELATION-ID=db2jcc_appli                                                 
   701                     CONNECTION-ID=SERVER                                                        
   701                     LUW-ID=NE05AD0B.G62B.DDD08F73108D=289                                       
   701                                                                                                 
   701             THREAD-INFO=JENKINS:126.5.173.11:JENKINS:db2jcc_application:DYNAMIC:1               
   701             :*:<::126.5.173.11.1579.DDD08F73108D>                                               
   701                     IS TIMED OUT. ONE HOLDER OF THE RESOURCE IS PLAN=DISTSERV                   
   701             WITH                                                                                
   701                     CORRELATION-ID=db2jcc_appli                                                 
   701                     CONNECTION-ID=SERVER                                                        
   701                     LUW-ID=NE05AD0B.G629.DDD08EEC81E5=287                                       
   701                                                                                                 
   701             THREAD-INFO=JENKINS:126.5.173.11:JENKINS:db2jcc_application:DYNAMIC:1               
   701             :*:*                                                                                
   701                     ON MEMBER DB2$                                                              
11.55.38 STC05917  DSNT501I  -DB2$ DSNILMCL RESOURCE UNAVAILABLE  702                                  
   702                        CORRELATION-ID=db2jcc_appli                                              
   702                        CONNECTION-ID=SERVER                                                     
   702                        LUW-ID=NE05AD0B.G62B.DDD08F73108D=289                                    
   702                        REASON 00C9008E                                                          
   702                        TYPE 00000210                       
   702                        NAME DBBZ001 .DBBRSEQR.00000001     
dennis-behm commented 10 months ago

@FALLAI-Denis can you please funnel this through the official IBM support cases? Merci beaucoup.

FALLAI-Denis commented 10 months ago

Hi @dennis-behm

A severity 2 case was opened in May 2023 on the advice of Gil P.:

TS012878359

Personally, I do not have access to this case on the IBM support site because I am not part of our system teams. I just have an email that was relayed to me.

I don't know who at IBM answered it, but we don't had a constructive response to this one. Only people familiar with DBB's data model and what has been coded in Java classes could answer.

For me it's a DBB problem, but on the off chance I was wondering if this problem could be the consequence of a bad implementation of our modified zAppbuild.

Provided IBM response:

There may be instances where large collections are created where we lock the sequence table and it remains locked until we finish the collection creation and it's dependencies. The table will remain locked until we finish the whole creation.

Does you know what other builds are doing around this same time? You can look at your other build logs to determine what they are doing around the same time? Did other builds fail around that time?

This link, https://www.ibm.com/docs/en/db2/11.5?topic=management-lock-waits-timeouts, has information about lock wait timeouts which appears to be off by default. You can increase your lock wait timeout in order to avoid all but the most extreme cases.

FALLAI-Denis commented 10 months ago

Hi,

What is the value of autocommit implemented in the DBB API? Is it forced to false?

See:

In the build.groovy script, is it the metadatastore field that keeps the connection with Db2 active for all accesses to the metadatastore during the build as a single transaction?

What would happen if we nullified this metadastore field after the createBuildResult action to reallocate it immediately afterwards?

FALLAI-Denis commented 10 months ago

Hi,

According to our own analysis, the SQLCODE -913 issue appears when new elements or new dependencies between elements are introduced during a build. In this situation, the DBB_SEQ_TABLE table is locked and remains locked until the end of the current build and the other builds started before the end of this first build which locked the DBB_SEQ_TABLE table are put on wait. If the wait time is too long other builds trigger a SQLCODE -913.

The problem is mainly in DBB... and indirectly in zAppbuild which manages the time of the "transaction" (unit of work) of the build driven by zAppbuild. We will not have a DBB correction within a timeframe compatible with our schedule for starting the solution in production (in one month). We have to find a workaround through zAppbuild which is the only element on which we can intervene.

We need to find a way that reduces the duration of the build "transaction" (unit of work) by fragmenting the build: stop the "transaction" every x elements built to commit the DBB_SEQ_TABLE table and resume the build on the same buildResult. Each fragment must not exceed the duration set for the timeout that triggers SQLCODE -913. Or have the means to force intermediate commits during the build by coding COMMIT instructions by direct call to the JDBC interface.

We do not know the impacts of implementing such modifications on the consistency of the results.

FALLAI-Denis commented 10 months ago

Hi,

FYI, a new CASE for this problem has been created: TS014072726. Previous was closed.

M-DLB commented 10 months ago

Thank you, @FALLAI-Denis As this problem is managed by the DBB Development team, I believe it would make sense to close this issue (issues opened here should only relate to the zAppBuild framework). I hope you see no concern when closing this issue.

Thank you!