ansible-collections / ibm_zos_core

Red Hat Ansible Certified Content for IBM Z
75 stars 44 forks source link

[Module] A module to check if a job is currently running #1451

Open ledina opened 2 months ago

ledina commented 2 months ago

Is there an existing issue for this?

New module description

In many cases in the CICS Collection and operator we require to know the state of a job. To be able to do this you can use zos_job_query but it involves us parsing the output when all you need is a yes or no answer (or maybe a it is waiting to execute option also).

We have now multiple use cases for is_job_running in CICS. One is when starting a region to check there are no duplicates with that job name. Another is in the stop scenario, we query the ID of currently running job given its name, shut it down, and then query is_job_running using the ID to wait until the region is down. You don't want to query by name here as another job might have taken its place. So we use job name in one and job ID in the other. Therefore being able to specify one or the other would be ideal, as it lets the user do as they please.

User story

As an Ansible user, I can check easily if my job (given its ID or name) is running without having to look into the zos_job_query return values and work out what it means.

fernandofloresg commented 2 months ago

Don't think a new module is needed maybe we can offer an option to just check if a job is running or return the status of the job (running, queue ...) in the return result.

richp405 commented 2 months ago

I think this is a straightforward extension to zos_job_query. Actually 2 things would change: 1 - allowing query by name, 2 - interpreting the return code and status from the result. This could be done a few ways. The simples would be to add 2 input values: 1 - JobName, 2 - ReturnStatus

We could get super-fancy and add a separate routine that does that under the covers, but I think that's overkill.

ddimatos commented 2 months ago

Internal discussion: https://.slack.com/archives/C01CF9VMEG3/p1709652966582879

richp405 commented 1 month ago

zos_job_query could probably be modified to do this, based on either a job id or owner id, or a job name. My thinking is to add a flag to the input for 'simplified response', which would provide a 1-word answer such as 'running', 'not found', 'ended', etc based on response codes.

We're a bit heavily loaded now, but I'll see about getting into the next quarters' development plan.

richp405 commented 1 month ago

I read through the slack discussion, and think we could provide a simpler interface for these 2 specific needs: 1 - Given a job name or ID number, indicate if the job is found and return the field not provided 2 - If the job is found, return the running status of the job (running/ran/canceled, etc) or NOTFOUND

This is a small (2-3 pts) based on it leveraging zos_job_query and just. processing the results.

input:

output

roded commented 4 weeks ago

I'd like to add our usecase to this issue.

We're using zos_job_query to query running jobs by JOBNAME and have noticed that the performance of the module seems to scale up with the current number of jobs on the spool. On our environments, it could take a considerable number of seconds to call zos_job_query with the job_name argument. I assume this is due to the job detail parsing done by the module (Not sure if parsing is done before or after filtering).

In our case, we're interested in knowing whether the job is running and its ASID so that we could stop it specifically in the case of multiple STCs running with the same name.

I think that ideally, we'd like to be able to call zos_job_query with a list of fields which will be parsed and returned. For your consideration. Thanks

ddimatos commented 3 weeks ago

There are a number of data points in this converstation mostly focusing on altering zos_job_query, while this is still a possibility, my reservations about altering the module response based on options is that it stretches the module design pattern in that a module should do one thing and do it well, for which some of these are:

  1. Be consistent about returns (some modules are too random), unless it is detrimental to the state/action.
  2. Make returns reusable–most of the time you don’t want to read it, but you do want to process it and re-purpose it.
  3. Provide consistent return values within the standard Ansible return structure, even if NA/None are used for keys normally returned under other options.
  4. Avoid creating a module that does the work of other modules; this leads to code duplication and divergence, and makes things less uniform, unpredictable and harder to maintain. Modules should be the building blocks. If you are asking ‘how can I have a module execute other modules’ … you want to write a role.

My points are aimed at the original ask which is to allow for zos_job_query to support if a started task is running or not, I will get back the started task and how its not technically supported for this module but allow me to focus on module design patterns first.

To be effective and return if a job is running or not, it would not benefit the user if we buried the answer in a complex response, while this response is not complex in comparison to others, it was originally designed for consistency with other job type modules where it will return a result of jobs where each index is a job , hence if you want to access the first job you would do so like job_sub_result.jobs[0].job_name. While some of the other job modules are not currently taking full advantage of the job array , its is a future direction.

So I don't see a whole lot of benefit to return the status of a job that is running if one has to dig down into the JSON structure to extract the result they are looking for, what would be the point, this information is there now, the original ask I believe was to go from something like a complex response to a simple one, given a JOB ID of JOB16577 we have today:

[
    {
        "asid": 4,
        "creation_date": "2023-05-03",
        "creation_time": "12:14:00",
        "job_class": "A",
        "job_id": "JOB16577",
        "job_name": "LINKCBL",
        "owner": "ADMIN",
        "priority": 0,
        "queue_position": 0,
        "ret_code": {
            "code": "null",
            "msg": "CANCELED"
        },
        "svc_class": "E"
    }
]

to a simpler response

{ status: "running" }

Point being that a devOps engineer can reduce the work to find out if a job is running in one of the supported states.

If we were to adjust the response to the simpler response then would be stretching and challenging the module design pattern in my opinion (see bullets 1 - 4).

The second concern I have is using zos_job_query complex type ret_code to determine if a started task is still running, the code was not written to monitor started tasks, while similar in nature, this poses a false positive, for example some users are using this structures code value null to determine the state of a started task that is still running.

 "ret_code": {
            "code": "null",
            "msg": "CANCELED"
        },

This is not ideal, because null can come in certain conditions, a null value can come with a job TYPRUN=HOLD currently until we fully support all TYPRUN, ideally started tasks would specifically need to be coded in this module or it be added to the started task module we are planning in the future.

I am thinking maybe what the collection ought to offer is a role , which aligns to bullet 4 somewhat, in this role we would do the matching and parsing for a job (noticed I am not saying started task, how you use it would be up to you though), where the role would take a JobID (not job name, since those can be duplicative) and we return a simple dictionary of one key:value mapping where the status is a value of what we support jobs today or we could just reduce it down to true or false, it just depends on what is usable, today we have these supported statuses:

"ABEND",      # ZOAU job ended abnormally
"SEC ERROR",  # Security error (legacy Ansible code)
"SEC",        # ZOAU security error
"JCL ERROR",  # Job had a JCL error (legacy Ansible code)
"JCLERR",     # ZOAU job had a JCL error
"CANCELED",   # ZOAU job was cancelled
"CAB",        # ZOAU converter abend
"CNV",        # ZOAU converter error
"SYS",        # ZOAU system failure
"FLU"         # ZOAU job was flushed
ddimatos commented 3 weeks ago

Regarding the comment from @roded while related probably would be best to bring this up in a new issue; while it does relate to this topic so I can comment.

"I think that ideally, we'd like to be able to call zos_job_query with a list of fields which will be parsed and returned. For your consideration. Thanks"

I do believe all the parsing is done upfront, I think we could consider a performance booster with a module option that would allow some code paths be run only, such a option might be a raw type or list passed, possibly a choice that run only those corresponding code paths, e.g; response: ['asid', 'job_class'] yet the response would be the original contracted response not a minimized one with many nulls to avoid breaking existing automation and the module design patter, so something like this could be returned: [ { "asid": 4, "creation_date": null, "creation_time": null, "job_class": "A", "job_id": null, "job_name": null "owner": null, "priority": null, "queue_position": null, "ret_code": { "code": null, "msg": null }, "svc_class": null } ]

Such a change would severely change the common utility we have used by 3 modules, without further investigation and scope, this is appears to be a very large item which is fine so long as it is possible and provides benefit.

@roded if this is something you want us to pursue please open a git issue of type enhancement and explain your request and we can reference this issue later.