Consider using externalized common logging base model for WES and TES

Problem

Using the same schemas or base schemas across APIs to support identical or similar use cases facilitates implementation and therefore has the potential to increase adoption.

However, even though the WES Log schema and the TES tesExecutorLog schema are very similar to one another and likely originate from the same ancestor, they diverged over time. While this may well be for good reasons, it is also plausible that the divergence is simply a result of largely different communities working separately on the further development of the different APIs.

It might thus be worthwhile to explore whether the WES Log and TES tesExecutorLog schemas could be harmonized and how that could simplify or otherwise benefit the specifications.

Possible solution

A possible solution is to replace WES Log and TES tesExecutorLog with a new schema that is defined in an independent, external OpenAPI document that could be maintained by the Cloud API and/or DaMaSC communities. An alternative to using the same identical schema for WES Log and TES tesExecutorLog could be to instead define a base schema that both schemas inherit from and extend differently.

The differences between the specifications are highlighted in the "Additional context" section below.

Possible alternatives

If differences and/or use cases between both schemas are too different, it would not make sense to harmonize schemas. In that case, nothing should be done.

Additional context

Schemas

WES `Log` schema

Log:
  title: Log
  type: object
  properties:
    name:
      type: string
      description: The task or workflow name
    cmd:
      type: array
      items:
        type: string
      description: The command line that was executed
    start_time:
      type: string
      description: When the command started executing, in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ"
    end_time:
      type: string
      description: When the command stopped executing (completed, failed, or cancelled), in ISO 8601 format "%Y-%m-%dT%H:%M:%SZ"
    stdout:
      type: string
      description: A URL to retrieve standard output logs of the workflow run or task.  This URL may change between status requests, or may not be available until the task or workflow has finished execution.  Should be available using the same credentials used to access the WES endpoint.
    stderr:
      type: string
      description: A URL to retrieve standard error logs of the workflow run or task.  This URL may change between status requests, or may not be available until the task or workflow has finished execution.  Should be available using the same credentials used to access the WES endpoint.
    exit_code:
      type: integer
      description: Exit code of the program
      format: int32
    system_logs:
      type: array
      items:
        type: string

      description: |-
        System logs are any logs the system decides are relevant,
        which are not tied directly to a workflow.
        Content is implementation specific: format, size, etc.

        System logs may be collected here to provide convenient access.

        For example, the system may include an error message that caused
        a SYSTEM_ERROR state (e.g. disk is full), etc.
  description: Log and other info

TES `tesExecutorLog` schema

tesExecutorLog:
  required:
  - exit_code
  type: object
  properties:
    start_time:
      type: string
      description: Time the executor started, in RFC 3339 format.
      example: 2020-10-02T10:00:00-05:00
    end_time:
      type: string
      description: Time the executor ended, in RFC 3339 format.
      example: 2020-10-02T11:00:00-05:00
    stdout:
      type: string
      description: |-
        Stdout content.

        This is meant for convenience. No guarantees are made about the content.
        Implementations may chose different approaches: only the head, only the tail,
        a URL reference only, etc.

        In order to capture the full stdout client should set Executor.stdout
        to a container file path, and use Task.outputs to upload that file
        to permanent storage.
    stderr:
      type: string
      description: |-
        Stderr content.

        This is meant for convenience. No guarantees are made about the content.
        Implementations may chose different approaches: only the head, only the tail,
        a URL reference only, etc.

        In order to capture the full stderr client should set Executor.stderr
        to a container file path, and use Task.outputs to upload that file
        to permanent storage.
    exit_code:
      type: integer
      description: Exit code.
      format: int32
  description: ExecutorLog describes logging information related to an Executor.

Comparison

Properties

Property	WES `Log`	TES `tesExecutorLog`	Identical type/format	Differences
`name`	:heavy_check_mark:	:x:	( :heavy_check_mark: )	`name` is defined for TES `tesTask`, two levels upstream of `tesExecutorLog`, with a slightly different description (generic in WES, not generic in TES)
`cmd`	:heavy_check_mark:	:x:	( :heavy_check_mark: )	An equivalent `command` property is defined in TES `tesExecutor`, which is available through `tesTask.executors[]`; however, it is defined more explicitly as an array of string components (but representing a single command)
`start_time`	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	(1) the WES `Log` description is generic, the TES `tesExecutorLog` description is not (refers to "executor"); (2) the WES `Log` time format is more narrowly defined, expecting ISO 8601 with "%Y-%m-%dT%H:%M:%SZ" (which is also RFC 3339 compliant), compared to TES `tesExecutorLog` which just references RFC 3339 (see here for a comparison of ISO 8601 and RFC 3339); (3) an example is provided for TES `tesExecutorLog`, but not for WES `Log`
`end_time`	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	same as for `start_time`
`stdout`	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	WES `Log` is a lot more narrowly defined than TES `tesExecutorLog`, with all valid WES `Log` `stdout` responses being valid TES `tesExecutorLog` `stdout` responses, but the same is not true vice versa
`stderr`	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	same as for `stdout`
`exit_code`	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	(1) slight difference in description text; (2) `exit_code` is required in TES `tesExecutorLog`, but not in WES `Log`
`system_logs`	:heavy_check_mark:	:x:	( :heavy_check_mark: )	`system_logs` are defined for TES `tesTaskLog`, one level upstream of `tesExecutorLog`, with a slightly different description (not generic for both WES and TES)

Other fields

WES Log (but not TES tesExecutorLog) has a title field
The WES Log description is generic, the TES tesExecutorLog description is not (refers to "executor")

A common base schema for WES Log and TES tesExecutorLog seems possible, without introducing (considerable) breaking changes.

Proposed base schema

We could, for example, define a schema BaseCommandLog like so:

BaseCommandLog:
  title: BaseCommandLog
  type: object
  properties:
    name:
      type: string
      description: Descriptive name for the command execution.
      examples:
        - sleep
        - "md5 sum"
    cmd:
      type: array
      items:
        type: string
      description: >
        The executed command expressed as a sequence of program arguments, where
        the first argument is the program to execute.
      examples:
        - ["sleep", "5"]
        - ["/bin/md5sum", "/data/file_in", ">", "/data/file_out"]
    start_time:
      type: string
      description: >
        Time at which the execution started, in ISO 8601- and RFC 3339-compliant
        "YYYY-MM-DDThh:mm:ssZ" format
      examples:
        - 2020-10-02T11:00:00Z
        - 2020-01-31T23:59:01Z
    end_time:
      type: string
      description: >
        Time at which the execution permanently concluded, in ISO8601- and RFC
        3339-compliant "YYYY-MM-DDThh:mm:ssZ" format
      examples:
        - 2020-10-02T23:00:00Z
        - 2020-02-01T00:01:59Z
    stdout:
      type: string
      description: >
        A URL to retrieve complete standard output logs of the command
        execution. The URL may change between requests, and it may not be
        available until the command execution has permantly concluded. Should be
        accessible through credentials that a client who is acting on behalf of
        the owner of the corresponding resource will typically have, such as
        those used to access the service or other outputs resulting from the
        associated resource.
      examples:
        - s3://my-object-store/sleep.stdout
        - https://my.service.org/api/v1/resources/Q2D36M7/md5_sum.stdout
    stderr:
      type: string
      description: >
        A URL to retrieve complete standard error logs of the command execution.
        The URL may change between requests, and it may not be available until
        the command execution has permantly concluded. Should be accessible
        through credentials that a client who is acting on behalf of the owner
        of the corresponding resource will typically have, such as those used to
        access the service or other outputs resulting from the associated
        resource.
      examples:
        - s3://my-object-store/sleep.stdout
        - https://my.service.org/api/v1/resources/Q2D36M7/md5_sum.stderr
    exit_code:
      type: integer
      description: Exit code of the command execution.
      format: int32
      examples:
        - 0
        - 1
    system_logs:
      type: array
      items:
        type: string
      description: >
        Logs that the implementation decides are relevant, but which are not
        tied directly to the command execution, e.g., format, size, error
        message that caused a `SYSTEM_ERROR`.
      examples:
        - "total output size: N/A"
        - "system error: no space left on volume" 
  description: Command execution log.

Changes with respect to current schemas

The following table lists the changes for the WES Log and TES tesExecutorLog changes with respect to their properties (changes that only affect the wording but not the meaning of a description are ignored):

Property	Changes to WES `Log`	Changes to TES `tesExecutorLog`	Breaking change
`name`	:x:	not previously defined; no changes with respect to `tesTask.name`	:x:
`cmd`	description now spells out what was previously only implied: that commands should be represented as sequences of program arguments	not previously defined; no changes with respect to `tesTask.executors[].command`	:x:
`start_time`	:x:	More narrowly defined: not all RFC 3339 formats are supported anymore (including that of the previous example, which included a time zone offset)	:x:
`end_time`	:x:	More narrowly defined: not all RFC 3339 formats are supported anymore (including that of the previous example, which included a time zone offset)	:x:
`stdout`	More broadly defined: URLs may be accessible only with credentials to location where command executions are stored (which may be different from the ones used to access the service, which were previously recommended to be sufficient)	More narrowly defined: only URLs to standard output logs supported	( :x: ) WES clients may potentially break, if (1) standard output logs are stored on a different service, (2) the client does not provide credentials to access the service where the logs are stored, and (3) the client requires access to the logs; however, access to the logs via the service credentials is not guaranteed in the current WES specs ("should", not "must")
`stderr`	More broadly defined: URLs may be accessible only with credentials to location where command executions are stored (which may be different from the ones used to access the service, which were previously recommended to be sufficient)	More narrowly defined: only URLs to standard output logs supported	( :x: ) WES clients may potentially break, if (1) standard error logs are stored on a different service, (2) the client does not provide credentials to access the service where the logs are stored, and (3) the client requires access to the logs; however, access to the logs via the service credentials is not guaranteed in the current WES specs ("should", not "must")
`exit_code`	:x:	:x:	:x:
`system_logs`	:x:	not previously defined; no changes with respect to `tesTask.logs.system_logs` (one level up)	:x:

Usage

The schema BaseCommandLog could be consumed in the WES and TES API specifications as described below.

WES `Log`

Log:
  title: Log
  allOf:
    - $ref: 'https://raw.githubusercontent.com/ga4gh/ga4gh-cloud-api-common-schemas/v1.0.0/wes_tes.yaml#/components/schemas/BaseCommandLog'
    - type: object
  description: Workflow engine execution log.

TES `tesExecutorLog`

tesExecutorLog:
  title: tesExecutorLog
  allOf:
    - $ref: 'https://raw.githubusercontent.com/ga4gh/ga4gh-cloud-api-common-schemas/v1.0.0/wes_tes.yaml#/components/schemas/BaseCommandLog'
    - type: object
  required:
    - exit_code
  description: Executor command execution log.

Summary

Importantly, the proposed changes are not breaking for existing WES and TES API consumers ("old clients will be compatible with new servers"). Moreover, by using a base schema approach, different APIs are still able to add constraints such as required properties or even override properties inherited from the base schema.

The proposed changes further have the following benefits:

Defining properties start_time, end_time, stdout and stderr more narrowly in TES will allow clients to process information more easily/effectively:
- Currently, a time without a date is TES-compliant, even though it is not stable over multiple days; with the proposed changes, times MUST include a date and they MUST be expressed in UTC
- Currently, clients are warned that no guarantees about the standard output and error log contents are made; with the proposed changes, clients can expect URLs that point to complete logs
Explicitly describing property command as a sequence of program arguments in WES is less likely to lead to different behavior across different implementations compared to the current implicit annotation through the type (array of strings). For example, some WES implementers could be tempted to use only the first array item to hold the entire shell command.
Defining properties stdout and stderr more broadly in WES enables implementers to more easily design systems where API services are decoupled from storage services. Such a design allows for easier scalability and improved data privacy/security, as access to logs can be protected by requirements that are different from those used to protect the API endpoints. While credentials used to access the API are not currently required to also give access to standard output and error logs, the strong wording ("Should be available using the same credentials used to access the WES endpoint") may fail to encourage both the design of such modular systems and their support by clients.
The addition of (optional) name, cmd and system_logs may be useful for TES clients to access common logging information from the same location, rather than having to parse four different parts of the tesTask schema. It also increases the expressiveness of the tesTask schema:
- Implementations can choose to name executors via the proposed BaseCommandLog.name property.
- If implementations modify the executor commands they receive via a task request in tesTask.executors[].command, they can log the modified (actually executed) command via the proposed BaseCommandLog.command property.
- If implementations decide that a log-worthy system event is relevant to a particular executor rather than the task as a whole, they can capture this increased resolution by logging such events via the proposed BaseCommandLog.system_logs property.

ga4gh / workflow-execution-service-schemas