galasa-dev / projectmanagement

Project Management repo for Issues and ZenHub
7 stars 4 forks source link

ZOS: Duplicate JobIDs being assigned - [possibly] causing test failures #789

Open matthewchivers opened 3 years ago

matthewchivers commented 3 years ago

Recent test run C5630 failed due to not being able to find a Job:

19/08/2021 19:29:08.442 DEBUG main dev.galasa.zosrseapi.internal.RseapiRestApiProcessor.getCurrentRseapiServer - Using RSE API server on MV2D https://winmvs2d.hursley.ibm.com:6800
19/08/2021 19:29:08.442 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.addCommonHeaders - Adding HTTP header: accept: */*
19/08/2021 19:29:08.442 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.get - Request: GET https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/GALKTFUQ/JOB33466
19/08/2021 19:29:08.493 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.get - Response: HTTP/1.1 404  - GET https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/GALKTFUQ/JOB33466
19/08/2021 19:29:08.494 TRACE main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.updateJobStatus - {"message":"No job with name 'GALKTFUQ' and id 'JOB33466' was found","status":"NOT_FOUND"}
19/08/2021 19:29:08.494 TRACE main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.updateJobStatus - JOBID=JOB33466 JOBNAME=GALKTFUQ NOT FOUND
19/08/2021 19:29:08.494 TRACE main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.updateJobStatus - JOBID=JOB33466 JOBNAME=GALKTFUQ OWNER=null TYPE=null STATUS=NOTFOUND RETCODE=????

It appears that this same job may have been deleted by an earlier test: C5628. Where the same JobID was used: JOB33466.

19/08/2021 19:29:08.684 DEBUG main dev.galasa.zosrseapi.internal.RseapiRestApiProcessor.getCurrentRseapiServer - Using RSE API server on MV2D https://winmvs2d.hursley.ibm.com:6800
19/08/2021 19:29:08.684 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.addCommonHeaders - Adding HTTP header: accept: */*
19/08/2021 19:29:08.684 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.delete - Request: DELETE https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/GALRR7CO/JOB33466
19/08/2021 19:29:11.329 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.delete - Response: HTTP/1.1 401  - DELETE https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/GALRR7CO/JOB33466
19/08/2021 19:29:11.330 TRACE main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.buildErrorString - {"message":"Error code: 2005. Error message: Command failed.","status":"UNAUTHORIZED"}
19/08/2021 19:29:11.330 ERROR main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.cancel - Error Purge job, HTTP Status Code 401 : HTTP/1.1 401 

Though this seemingly also comes back as unauthorised...?

This pattern was also noticed in tests C4955 and C4954 from 17th August. C4955 failed due to the 401 unauthorized (like C5628 above), but C4954 is the test that failed to retrieve the job (like C5630 above).

The immediate observable symptom/cause is the overlap in JobIDs. This strikes me as similar to the recent RunID overlap issues we were encountering.

DavidXRoberts commented 3 years ago

Looks like this is a RSE API bug. C5628 submits a job with jobname GALRR7CO and RES API returns that job with jobid JOB33466

19/08/2021 19:29:07.992 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.postJson - Request: POST https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/string/
19/08/2021 19:29:07.993 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.postJson - body: 
{"jcl":"//GALRR7CO JOB ,\n//         CLASS=A,\n//         MSGCLASS=A,\n//         MSGLEVEL=(1,1)\n/*JOBPARM SYSAFF=MV2D\n//HELLO  EXEC PGM=IEBGENER                 \n/*                                         \n//SYSUT1 DD *                              \nHELLO WORLD FROM GALASA\n//*                                        \n//SYSUT2 DD SYSOUT=A                       \n//SYSPRINT DD SYSOUT=A                     \n//SYSIN    DD DUMMY                        \n//              \n"}
19/08/2021 19:29:08.338 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.postJson - Response: HTTP/1.1 201  - POST https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/string/
19/08/2021 19:29:08.339 TRACE main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.submitJob - {"jobName":"GALRR7CO","owner":"JATP","jobId":"JOB33466","returnCode":"CC 0000","subsystem":"JES2","type":"JOB","executionClass":"A","phaseName":"Job is on the hard copy queue","status":"COMPLETION"}

C3560 submits a job with jobname GALKTFUQ but RSE API returns job GALRR7CO/JOB33466 which is not the job it submitted

19/08/2021 19:29:07.768 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.postJson - Request: POST https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/string/
19/08/2021 19:29:07.769 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.postJson - body: 
{"jcl":"//GALKTFUQ JOB ,\n//         CLASS=A,\n//         MSGCLASS=A,\n//         MSGLEVEL=(1,1)\n/*JOBPARM SYSAFF=MV2D\n//COBOL  EXEC PGM=IEFBR14                           \n/*                                                  \n"}
19/08/2021 19:29:08.441 TRACE main dev.galasa.zosrseapi.internal.RseapiImpl.postJson - Response: HTTP/1.1 201  - POST https://winmvs2d.hursley.ibm.com:6800/rseapi/api/v1/jobs/string/
19/08/2021 19:29:08.442 TRACE main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.submitJob - {"jobName":"GALRR7CO","owner":"JATP","jobId":"JOB33466","returnCode":"CC 0000","subsystem":"JES2","type":"JOB","executionClass":"A","phaseName":"Job is on the hard copy queue","status":"COMPLETION"}
19/08/2021 19:29:08.442 INFO  main dev.galasa.zosbatch.rseapi.manager.internal.RseapiZosBatchJobImpl.submitJob - JOB GALKTFUQ(JOB33466) Submitted