Open mwvaughn opened 3 years ago
Here's a worked example of what such a usage command might look like. We assume here that the reactors package supports multiple context and message schemas.
USAGE
*****
docker run -it IMAGE reactor.py
VARIABLES
=========
Common
------
* MES : Message interpreted by reactor.py
* TAPIS_API_URL : URL of Tapis API server
* TAPIS_ACCESS_TOKEN : Oauth2 access token for Tapis API
Parameters
----------
Parameter variables can be set in the following combinations:
Context 1
~~~~~~~~~
meep : description
merp : description
Context 2
~~~~~~~~~
beep : description
boop : description
meep : description
Messages
--------
JSON values for MES must follow one of the following JSON schemas:
* /message_schemas/message.jsonschema
* /message_schemas/other.jsonschema
Configuration
-------------
Setting values for these variables will override the corresponding values in /config.yml
_REACTOR_LOG_LEVEL
_REACTOR_LOG_KEY
_REACTOR_OTHER_KEY
...
Should this also print the variables set in secrets.json while deploying the reactor?
You are right, and those are implied by the Configuration section above.
From within the container, we don't actually know variables are set by the secrets.json mechanism. We only know the universe of possible variable names, which are derived from the namespace (_REACTOR_
) plus the uppercased, underscore-delimited values of the first- and second-level keys in config.yml
Here's an example of the "Configuration" section of a live-generated usage
command.
Configuration
-------------
This function is configured via files found at:
* /config.yml
The current union configuration is:
---
logger:
client_key: F3VRMUNrPeaq84zp
host: logger.sd2e.org
path: /logger
port: 31311
proto: http
uri: http://logger.sd2e.org:31311
logs:
file: null
level: DEBUG
token: null
slack:
channel: notifications
webhook: null
First- or second level keys in this configuration can be overridden
by setting environment variables. These variables are supported:
* _REACTOR_LOGGER_CLIENT_KEY
* _REACTOR_LOGGER_HOST
* _REACTOR_LOGGER_PATH
* _REACTOR_LOGGER_PROTO
* _REACTOR_LOGGER_URI
* _REACTOR_LOGS_FILE
* _REACTOR_LOGS_LEVEL
* _REACTOR_LOGS_TOKEN
* _REACTOR_SLACK_CHANNEL
* _REACTOR_SLACK_WEBHOOK
Comments are welcome
Here is the latest draft of a usage
command output. This is the direct output of a working Reactor built using the current version of the code.
% python -m reactors.cli usage
USAGE
=====
This container image implements an Abaco function:
"This function prints HELLO WORLD using Reactor.logger"
It is runnable outside Abaco as follows:
docker run -it --env var=val REPO
Parameters
----------
Abaco passes parameters into the function via URL parameters:
curl -XPOST https://api.tacc.cloud/actors/v2/message?foo=bar
In this example, an environment variable 'FOO' will be set in the
container runtime with a value of 'bar'. To allow an Abaco function
to be run independently, this can be emulated by setting environment
variables when running the function container.
docker run --env FOO=bar <container> <command>
A function developer may specify one or more valid sets of
parameters for use within the function. These parameter sets
can be validated or classified using built-in functions from
the Reactors module.
This function accepts the following environment variable sets:
Context schema.$id: RequiresUUID
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/context_schemas/uuid.jsonschema
Parameters:
* UUID desc: None; type: string; required: True
Context schema.$id: Default
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/src/reactors/validation/context.jsonschema
Parameters:
* MSG desc: Message received by the Actor; type: string; required: True
* x-nonce desc: An Abaco nonce (API key); type: string; required: False
Please note that variable sets beyond 'Default' must also contain the
variables specified in 'Default', such as 'MSG'.
JSON Messages
-------------
Abaco accepts JSON-formatted messages that are transmitted to the
container runtime via the 'MSG' environment variable. They can,
in turn, be validated or classified using built-in methods from
the Reactors module.
curl -XPOST -H "Content-Type: application/json" \
-d '{"message": {"foo": "bar"}}' \
https://api.tacc.cloud/actors/v2/messages
This function accepts and can validate JSON-formatted values
for 'MSG' that validate to the following JSON schemas:
* Message schema.$id: AWS_SQS
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/message_schemas/sqs.jsonschema
* Message schema.$id: file:///Users/mwvaughn/src/TACC-Cloud/python-reactors/message_schemas/email-noid.jsonschema
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/message_schemas/email-noid.jsonschema
* Message schema.$id: DefaultJSON
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/src/reactors/validation/message.jsonschema
Configuration
-------------
The Reactor object provided by this SDK and usable within the function
is configured via files found at:
* /Users/mwvaughn/src/TACC-Cloud/python-reactors/src/reactors/config.yml
If this current function utilizes this feature of the SDK, its
current configuration is:
---
logger:
client_key: F3VRMUNrPeaq84zp
host: logger.sd2e.org
path: /logger
port: 31311
proto: http
uri: http://logger.sd2e.org:31311
logs:
file: null
level: DEBUG
token: null
slack:
channel: notifications
webhook: null
First- or second level keys in the configuration can be overridden
by setting environment variables at run time. The following
variables are supported:
* _REACTOR_LOGGER_CLIENT_KEY
* _REACTOR_LOGGER_HOST
* _REACTOR_LOGGER_PATH
* _REACTOR_LOGGER_PROTO
* _REACTOR_LOGGER_URI
* _REACTOR_LOGS_FILE
* _REACTOR_LOGS_LEVEL
* _REACTOR_LOGS_TOKEN
* _REACTOR_SLACK_CHANNEL
* _REACTOR_SLACK_WEBHOOK
Tapis Client
------------
This function may require an active Tapis client. One is automatically
provided by Abaco but can be injected at run time by providing either
a credentials file or setting environment variables.
Credentials File
~~~~~~~~~~~~~~~~
A Tapis client may be configured by volume mounting a credentials file:
docker run -it -v ${HOME}/.agave:/root/.agave REPO
Environment Variables
~~~~~~~~~~~~~~~~~~~~~
A Tapis client may be configured by passing these variables:
* TAPIS_BASE_URL - API server URL
* TAPIS_TOKEN - Oauth2 access token
Thanks, Matt. What do you think about moving some (most) of this to online documentation? The idea is to remove all the content that does not change between different reactors, and refer users to the docs for details on how the SDK functions in general (and how they could develop their own reactors, extend/modify others' reactors).
% python -m reactors.cli usage
USAGE
=====
@@@@ "minimum viable" docker run command here @@@@
This container image implements an Abaco function:
"This function prints HELLO WORLD using Reactor.logger"
Parameters
----------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/parameters for details.
This function accepts the following environment variable sets:
Context schema.$id: RequiresUUID
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/context_schemas/uuid.jsonschema
Parameters:
* UUID desc: None; type: string; required: True
Context schema.$id: Default
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/src/reactors/validation/context.jsonschema
Parameters:
* MSG desc: Message received by the Actor; type: string; required: True
* x-nonce desc: An Abaco nonce (API key); type: string; required: False
JSON Messages
-------------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/messages for details.
This function accepts and can validate JSON-formatted values
for 'MSG' that validate to the following JSON schemas:
* Message schema.$id: AWS_SQS
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/message_schemas/sqs.jsonschema
* Message schema.$id: file:///Users/mwvaughn/src/TACC-Cloud/python-reactors/message_schemas/email-noid.jsonschema
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/message_schemas/email-noid.jsonschema
* Message schema.$id: DefaultJSON
File: /Users/mwvaughn/src/TACC-Cloud/python-reactors/src/reactors/validation/message.jsonschema
Configuration
-------------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/config for details.
The current configuration is:
---
logger:
client_key: F3VRMUNrPeaq84zp
host: logger.sd2e.org
path: /logger
port: 31311
proto: http
uri: http://logger.sd2e.org:31311
logs:
file: null
level: DEBUG
token: null
slack:
channel: notifications
webhook: null
The following variables can be overridden by setting environment variables at runtime:
* _REACTOR_LOGGER_CLIENT_KEY
* _REACTOR_LOGGER_HOST
* _REACTOR_LOGGER_PATH
* _REACTOR_LOGGER_PROTO
* _REACTOR_LOGGER_URI
* _REACTOR_LOGS_FILE
* _REACTOR_LOGS_LEVEL
* _REACTOR_LOGS_TOKEN
* _REACTOR_SLACK_CHANNEL
* _REACTOR_SLACK_WEBHOOK
Tapis Client
------------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/tapis for details.
A Tapis client may be configured by passing these variables:
* TAPIS_BASE_URL - API server URL
* TAPIS_TOKEN - Oauth2 access token
The thought here is that the person running the usage
command is most likely a user/consumer of the given custom reactor, not a developer (this point is up for debate). If I were a user and didn't have access to Abaco runtime, the first things I'd want to see here would be:
setup.py
-like metadata, etc.-e VAR=value
options that should be passed to the docker run
command. I haven't fully fleshed out this idea, so I'm not sure what value
would be. Maybe defaults? Populated from 'default'
field in the schemas?Thoughts?
project.ini
format to include a general [metadata]
section that would include fields such as author, help, license, etc. When the Docker image was built, these data could be included as tags. Unfortunately, this does not help very much because a containerized process cannot access those container image tags. I suppose we could just copy the contents of project.ini
into the container at build time, though I am not supremely happy with that design.default
from the schemas. Ultimately, I think we can only get to probably working invocation unless we start capturing a lot more metadata at build time.+1 on all three points here.
[metadata]
section, then we know where in the container-local CLI to put that metadata 😄 Required environment variables
-------------------
* UUID (string) - No description provided for this variable. This variable is enforced by the schema: .Example: 34a1e1dc-a571-4f2b-9b62-d42d3b223059-007
x-nonce (string) - An Abaco nonce (API key)
NOTE: An asterisk (*) denotes required variables. There may be more required variables that are not listed above; please see reactor documentation for details.
Required message variables
-------------------
...and so on
This is similar to what you have under JSON Schemas
. IFF we detect only one context.jsonschema
and only one message.jsonschema
(I suspect this will be the case for the vast majority of reactors), we expose any variables that are trivially parseable (strings, numbers, booleans). Despite my relatively limited knowledge of reactor use cases, I expect that most reactors will not use multiple context/message schemas, or implement anyOf
-like behavior, especially if reactors are written atomically as we recommend. It's certainly important that we support these complex use cases in the SDK code, but for auto-generated documentation, we could just say The schema(s) enforced for this reactor are too complicated for us to parse here, please read the docs
.
Okay that's enough rambling for this comment 😉
After some prototyping, here's another go. I still have not implemented metadata, but I am able to generate a sensible run string for the case where there is 0-1 contexts and 0-1 message schemas.
For environment variables, I use the default
from the context JSON schema, followed by the first value of examples
, and fail with <type>
if neither of those exist. For the JSON message, I am currently using hypothesis_jsonschema
to render an example JSON document.
Regarding Hypothesis: The hypothesis_jsonschema
package ignores default
and examples
in its faking strategy and is a little troubled by $ref
elements that point to external URL. But, it is very good at generating from pattern
and format
properties.
USAGE: This function prints HELLO WORLD using Reactor.logger
% docker run -it --env UUID="<string>" MSG='{"Key":"https://A.xfinity"}' <NAMESPACE/REPO:TAG>
Environment Variables
---------------------
* UUID (string) - None [None]
* MSG (string) - Message received by the Actor [None]
JSON Message
------------
The function accepts a JSON message (passed as MSG) conforming to schema:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "AWS SQS",
"$id": "AWS_SQS",
"description": "An AWS-like SQS notification",
"type": "object",
"properties": {
"Key": {
"type": "string",
"format": "uri",
"description": "An object-store equivalent to a file path."
}
},
"required": [
"Key"
]
}
Example: {"Key":"https://A.xfinity"}
Tapis Client
------------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/tapis for details.
A Tapis client may be configured by passing these variables:
* TAPIS_BASE_URL - API server URL
* TAPIS_TOKEN - Oauth2 access token
Configuration
-------------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/config for details.
The current configuration is:
---
logger:
client_key: F3VRMUNrPeaq84zp
host: logger.sd2e.org
path: /logger
port: 31311
proto: http
uri: http://logger.sd2e.org:31311
logs:
file: null
level: DEBUG
token: null
slack:
channel: notifications
webhook: null
Keys in this configuration can be overridden by setting environment
variables at run time. The following variables are supported:
* _REACTOR_LOGGER_CLIENT_KEY
* _REACTOR_LOGGER_HOST
* _REACTOR_LOGGER_PATH
* _REACTOR_LOGGER_PROTO
* _REACTOR_LOGGER_URI
* _REACTOR_LOGS_FILE
* _REACTOR_LOGS_LEVEL
* _REACTOR_LOGS_TOKEN
* _REACTOR_SLACK_CHANNEL
* _REACTOR_SLACK_WEBHOOK
This looks great IMO. My only question is why the JSON Schema
section is using hypothesis but Environment Variables
is not? Is this a constraint because of the way schemas are implemented/organized?
At this point, it's just an experiment to see which approach works the best, and I can probably converge them when I refactor. I wanted to get the live-generated example out for comment.
Cool, I like the Environment Variables
formatting more. Thanks, Matt!
This looks great. I was able to understand the usage command output easily, especially the example
JSON message it prints out. This will make it easier for a user to very quickly figure out what he should be passing as the message.
properties
and required
, would the user need to know the other information on the schema as well? Thanks for the feedback!
For question 1: We're just printing the preferred schema to the screen. We are forced to assume a little bit of familiarity with reading and interpreting JSONschema on the part of the user. Note that we do generate an Example
JSON document though our ability to do is constrained by the quality and detail level of the schema. I would say that the documentation for this feature should include a worked example of building and validating a JSON document from a JSON schema since many users will not be that familiar with it.
For question 2: Configuration for a database connection would probably be specified by the developer of the Reactor using config.yml
and the "Keys override" mechanism we leverage with the secrets.json
file on the CLI. I imagine it might look something like so:
Configuration
-------------
Please refer to https://tacc-cloud.github.io/python-reactors/usage/config for details.
The current configuration is:
---
mongodb_uri: null
logger:
client_key: F3VRMUNrPeaq84zp
...
Keys in this configuration can be overridden by setting environment
variables at run time. The following variables are supported:
* _REACTOR_MONGODB_URI
* _REACTOR_LOGGER_CLIENT_KEY
...
We don't have a good way to annotate the YAML configuration since PyYAML does not support comments. I think we just have to rely on well-named configuration keys in the config.yml
file.
Now that I look at it, we might want to explicitly point out that the null
values in the config need to be specified using environment variables. I would welcome suggested language for this.
Thanks a lot Matt.
Part of the work to define context and schema validation is to make it possible to provide a minimally helpful
usage
command for a given Reactor that can be called directly from the Docker image:docker run -it <reactor-image> usage
.