Open uniqueg opened 2 months ago
A common base schema for WES Log
and TES tesExecutorLog
seems possible, without introducing (considerable) breaking changes.
We could, for example, define a schema BaseCommandLog
like so:
BaseCommandLog:
title: BaseCommandLog
type: object
properties:
name:
type: string
description: Descriptive name for the command execution.
examples:
- sleep
- "md5 sum"
cmd:
type: array
items:
type: string
description: >
The executed command expressed as a sequence of program arguments, where
the first argument is the program to execute.
examples:
- ["sleep", "5"]
- ["/bin/md5sum", "/data/file_in", ">", "/data/file_out"]
start_time:
type: string
description: >
Time at which the execution started, in ISO 8601- and RFC 3339-compliant
"YYYY-MM-DDThh:mm:ssZ" format
examples:
- 2020-10-02T11:00:00Z
- 2020-01-31T23:59:01Z
end_time:
type: string
description: >
Time at which the execution permanently concluded, in ISO8601- and RFC
3339-compliant "YYYY-MM-DDThh:mm:ssZ" format
examples:
- 2020-10-02T23:00:00Z
- 2020-02-01T00:01:59Z
stdout:
type: string
description: >
A URL to retrieve complete standard output logs of the command
execution. The URL may change between requests, and it may not be
available until the command execution has permantly concluded. Should be
accessible through credentials that a client who is acting on behalf of
the owner of the corresponding resource will typically have, such as
those used to access the service or other outputs resulting from the
associated resource.
examples:
- s3://my-object-store/sleep.stdout
- https://my.service.org/api/v1/resources/Q2D36M7/md5_sum.stdout
stderr:
type: string
description: >
A URL to retrieve complete standard error logs of the command execution.
The URL may change between requests, and it may not be available until
the command execution has permantly concluded. Should be accessible
through credentials that a client who is acting on behalf of the owner
of the corresponding resource will typically have, such as those used to
access the service or other outputs resulting from the associated
resource.
examples:
- s3://my-object-store/sleep.stdout
- https://my.service.org/api/v1/resources/Q2D36M7/md5_sum.stderr
exit_code:
type: integer
description: Exit code of the command execution.
format: int32
examples:
- 0
- 1
system_logs:
type: array
items:
type: string
description: >
Logs that the implementation decides are relevant, but which are not
tied directly to the command execution, e.g., format, size, error
message that caused a `SYSTEM_ERROR`.
examples:
- "total output size: N/A"
- "system error: no space left on volume"
description: Command execution log.
The following table lists the changes for the WES Log
and TES tesExecutorLog
changes with respect to their properties (changes that only affect the wording but not the meaning of a description are ignored):
Property | Changes to WES Log |
Changes to TES tesExecutorLog |
Breaking change |
---|---|---|---|
name |
:x: | not previously defined; no changes with respect to tesTask.name |
:x: |
cmd |
description now spells out what was previously only implied: that commands should be represented as sequences of program arguments | not previously defined; no changes with respect to tesTask.executors[].command |
:x: |
start_time |
:x: | More narrowly defined: not all RFC 3339 formats are supported anymore (including that of the previous example, which included a time zone offset) | :x: |
end_time |
:x: | More narrowly defined: not all RFC 3339 formats are supported anymore (including that of the previous example, which included a time zone offset) | :x: |
stdout |
More broadly defined: URLs may be accessible only with credentials to location where command executions are stored (which may be different from the ones used to access the service, which were previously recommended to be sufficient) | More narrowly defined: only URLs to standard output logs supported | ( :x: ) WES clients may potentially break, if (1) standard output logs are stored on a different service, (2) the client does not provide credentials to access the service where the logs are stored, and (3) the client requires access to the logs; however, access to the logs via the service credentials is not guaranteed in the current WES specs ("should", not "must") |
stderr |
More broadly defined: URLs may be accessible only with credentials to location where command executions are stored (which may be different from the ones used to access the service, which were previously recommended to be sufficient) | More narrowly defined: only URLs to standard output logs supported | ( :x: ) WES clients may potentially break, if (1) standard error logs are stored on a different service, (2) the client does not provide credentials to access the service where the logs are stored, and (3) the client requires access to the logs; however, access to the logs via the service credentials is not guaranteed in the current WES specs ("should", not "must") |
exit_code |
:x: | :x: | :x: |
system_logs |
:x: | not previously defined; no changes with respect to tesTask.logs.system_logs (one level up) |
:x: |
The schema BaseCommandLog
could be consumed in the WES and TES API specifications as described below.
Log
Log:
title: Log
allOf:
- $ref: 'https://raw.githubusercontent.com/ga4gh/ga4gh-cloud-api-common-schemas/v1.0.0/wes_tes.yaml#/components/schemas/BaseCommandLog'
- type: object
description: Workflow engine execution log.
tesExecutorLog
tesExecutorLog:
title: tesExecutorLog
allOf:
- $ref: 'https://raw.githubusercontent.com/ga4gh/ga4gh-cloud-api-common-schemas/v1.0.0/wes_tes.yaml#/components/schemas/BaseCommandLog'
- type: object
required:
- exit_code
description: Executor command execution log.
Importantly, the proposed changes are not breaking for existing WES and TES API consumers ("old clients will be compatible with new servers"). Moreover, by using a base schema approach, different APIs are still able to add constraints such as required properties or even override properties inherited from the base schema.
The proposed changes further have the following benefits:
start_time
, end_time
, stdout
and stderr
more narrowly in TES will allow clients to process information more easily/effectively:
command
as a sequence of program arguments in WES is less likely to lead to different behavior across different implementations compared to the current implicit annotation through the type (array of strings). For example, some WES implementers could be tempted to use only the first array item to hold the entire shell command. stdout
and stderr
more broadly in WES enables implementers to more easily design systems where API services are decoupled from storage services. Such a design allows for easier scalability and improved data privacy/security, as access to logs can be protected by requirements that are different from those used to protect the API endpoints. While credentials used to access the API are not currently required to also give access to standard output and error logs, the strong wording ("Should be available using the same credentials used to access the WES endpoint") may fail to encourage both the design of such modular systems and their support by clients.name
, cmd
and system_logs
may be useful for TES clients to access common logging information from the same location, rather than having to parse four different parts of the tesTask
schema. It also increases the expressiveness of the tesTask
schema:
BaseCommandLog.name
property.tesTask.executors[].command
, they can log the modified (actually executed) command via the proposed BaseCommandLog.command
property.BaseCommandLog.system_logs
property.
Problem
Using the same schemas or base schemas across APIs to support identical or similar use cases facilitates implementation and therefore has the potential to increase adoption.
However, even though the WES
Log
schema and the TEStesExecutorLog
schema are very similar to one another and likely originate from the same ancestor, they diverged over time. While this may well be for good reasons, it is also plausible that the divergence is simply a result of largely different communities working separately on the further development of the different APIs.It might thus be worthwhile to explore whether the WES
Log
and TEStesExecutorLog
schemas could be harmonized and how that could simplify or otherwise benefit the specifications.Possible solution
A possible solution is to replace WES
Log
and TEStesExecutorLog
with a new schema that is defined in an independent, external OpenAPI document that could be maintained by the Cloud API and/or DaMaSC communities. An alternative to using the same identical schema for WESLog
and TEStesExecutorLog
could be to instead define a base schema that both schemas inherit from and extend differently.The differences between the specifications are highlighted in the "Additional context" section below.
Possible alternatives
If differences and/or use cases between both schemas are too different, it would not make sense to harmonize schemas. In that case, nothing should be done.
Additional context
Schemas
WES
Log
schemaTES
tesExecutorLog
schemaComparison
Properties
Log
tesExecutorLog
name
name
is defined for TEStesTask
, two levels upstream oftesExecutorLog
, with a slightly different description (generic in WES, not generic in TES)cmd
command
property is defined in TEStesExecutor
, which is available throughtesTask.executors[]
; however, it is defined more explicitly as an array of string components (but representing a single command)start_time
Log
description is generic, the TEStesExecutorLog
description is not (refers to "executor"); (2) the WESLog
time format is more narrowly defined, expecting ISO 8601 with "%Y-%m-%dT%H:%M:%SZ" (which is also RFC 3339 compliant), compared to TEStesExecutorLog
which just references RFC 3339 (see here for a comparison of ISO 8601 and RFC 3339); (3) an example is provided for TEStesExecutorLog
, but not for WESLog
end_time
start_time
stdout
Log
is a lot more narrowly defined than TEStesExecutorLog
, with all valid WESLog
stdout
responses being valid TEStesExecutorLog
stdout
responses, but the same is not true vice versastderr
stdout
exit_code
exit_code
is required in TEStesExecutorLog
, but not in WESLog
system_logs
system_logs
are defined for TEStesTaskLog
, one level upstream oftesExecutorLog
, with a slightly different description (not generic for both WES and TES)Other fields
Log
(but not TEStesExecutorLog
) has atitle
fieldLog
description is generic, the TEStesExecutorLog
description is not (refers to "executor")