IBM / cpdctl

cpdctl
Apache License 2.0
24 stars 20 forks source link

cpdctl job run get does not return rows read & written #15

Open mihaimyh opened 1 year ago

mihaimyh commented 1 year ago

Using cpdctl for retrieving job run does not includes total_rows_read, total_rows_written, last_state_change_timestamp and other values which the rest API does returns.

jacek-midura commented 1 year ago

@mihaimyh cpdctl processes API responses according to API definitions. Neither of these properties you mention are part of Jobs API specification - get a specific run of a job.

I queried this endpoint for some job run and got the following reponse:

{
    "metadata": {
        "name": "job run",
        "description": "Initial run",
        "asset_id": "c8c484f1-22a9-4dc9-b517-f08b5cc25a5e",
        "owner_id": "1000330999",
        "created": 1683645202489,
        "created_at": "2023-05-09T15:13:22Z",
        "usage": {
            "last_update_time": 1683645417970,
            "last_updated_at": "2023-05-09T15:16:57Z"
        }
    },
    "entity": {
        "job_run": {
            "job_ref": "994bd773-9234-468b-8646-1ae653b58458",
            "job_name": "onboarding-train-autoai-sample-bb77176f_job",
            "job_type": "orchestration_flow",
            "state": "Completed",
            "isScheduledRun": false,
            "configuration": {
                "version": "0995a7f1-3eb1-4deb-80f9-e6ae818d161c-1",
                "env_type": "",
                "env_variables": [],
                "runtime": {
                    "namespace": "zen",
                    "pipeline_run_name": "c8c484f1-22a9-4dc9-b517-f08b5cc25a5e-ffe5816c"
                }
            },
            "project_name": "ailc-automation-project",
            "job_parameters": [{
                "name": "deployment_space",
                "value": "cpd:///spaces/8ebf7015-a74a-433e-aba8-984b39b6b703"
            }],
            "last_state_change_timestamp": "2023-05-09T15:16:57Z",
            "duration": 211
        }
    }
}

I can see entity.last_state_change_timestamp here but not the other two. That's because various services behind the facade of Jobs API tend to return varying sets of properties in addition to these listed in API spec. E.g. total_rows_read and total_rows_written are typical for DataStage service.

mihaimyh commented 1 year ago

Thanks for the explanation, is there an OpenAPI spec for the DataStage API?

jacek-midura commented 1 year ago

This is the public documentation for DataStage API: https://cloud.ibm.com/apidocs/datastage Let me know if that's enough or if you need full API spec.

mihaimyh commented 1 year ago

@jacek-midura I am looking for an OpenAPI spec for the DataStage API, so we can easily build clients for it, similar to the definitions for the jobs, that you provided in your first post (https://api.dataplatform.cloud.ibm.com/v2/jobs/docs/swagger/#/Job%20Runs/job_runs_get)

jacek-midura commented 1 year ago

I just discovered that API spec is available from the documentation page - clicking on three dots menu next to IBM APIs for DataStage caption in top left corner reveals a command Download OpenAPI definition. Anyway, this is a direct link to the most recent spec version: https://cloud.ibm.com/apidocs/datastage.json Let me know if you needed API spec for some specific Cloud Pak for Data release.

mihaimyh commented 1 year ago

@jacek-midura Thanks for the tips, they are useful. In our organization we have v4.6 of Cloud Pack for Data installed. I am also interested of the Open API specs that are behind of cpdctl dsjob CLI tools, described here, with more interest in those related of pipelines and job runs

jacek-midura commented 1 year ago

The command dsjob jobrunstat does not have a single API underneath, it just queries and prints the Jobs API mentioned earlier. As for the group of commands under pipelines namespace, the API spec has not yet been published.