StackStorm / st2

StackStorm (aka "IFTTT for Ops") is event-driven automation for auto-remediation, incident responses, troubleshooting, deployments, and more for DevOps and SREs. Includes rules engine, workflow, 160 integration packs with 6000+ actions (see https://exchange.stackstorm.org) and ChatOps. Installer at https://docs.stackstorm.com/install/index.html
https://stackstorm.com/
Apache License 2.0
6.04k stars 746 forks source link

Feature Request - Load/store data/artifacts/binaries from external content source #4588

Open nmaludy opened 5 years ago

nmaludy commented 5 years ago
SUMMARY

Currently in a distributed StackStorm deployment, when running an action the node that the action is run on is random. This causes some headaches when trying to deal with files or artifacts when implementing things like a ETL workflow or CI/CD workflow.

ETL:

CI/CD:

The way this works now is:

ETL (Database query)

CI/CD (Files and Binaries)

ISSUE TYPE
IDEAS

Another workflow tool that i found has an interesting concept of Artifacts that it can be passed between steps in the workflow:

This spawned some thinking and relates to an idea i had in: https://github.com/StackStorm/st2/issues/4343

It would be cool if we could pass in "artifacts" as inputs/outputs associated with a task in a workflow. The task would perform some pre/post work to load/store the artifact around the action run.

Sudo coding it could look something like what i had in my other request.

ETL - Database

This would retrieve a database artifact from Mysql, do some processing, then publish the results back to Mysql.

vars:
  sql_connection: "{{ st2kv.system.sql_connection }}"

tasks:
  task1:
    action: transaction.place_orders
    input_artifact:
      mysql:
        connection: "{{ ctx().sql_connection }}"
        # by default this returns a list of dicts
        query: "SELECT id,name,date FROM orders ORDER BY date DESC;"
    input:
      data: "{{ input_artifact().mysql.result }}"
    next:
      - when: "{{ succeeded() }}"
        publish_artifact:
          mysql:
            connection: "{{ ctx().sql_connection }}"
            insert:
              # name of the table
              table: "history"
              # list of dicts to insert
              values: "{{ result() }}"
CI/CD - Files and Binaries

This would run a build process that checks out a git repo, builds the thing, uploads the RPM to a Yum repo and uploads the build log to an S3 bucket.

vars:
  sql_connection: "{{ st2kv.system.sql_connection }}"

tasks:
  build:
    action: cicd.build
    input_artifact:
      git:
        # downloads the repo to a local path on the actionrunner
        repo: https://github.com/org/repo.git
    input:
      path: "{{ input_artifact().git.path }}"
    next:
      - when: "{{ succeeded() }}"
        publish_artifact:
          nexus:
            path: "{{ result().rpm_path }}"
            upload: rpm
            url: "{{ st2kv.system.nexus.rpm_upload_url }}"
            username: "{{ st2kv.system.nexus.username }}"
            password: "{{ st2kv.system.nexus.password | decrypt_kv }}"
          s3:
            path: "{{ result().build_log_path }}"
            endpoint: storage.googleapis.com
            bucket: my-bucket-name
            key: path/in/my/bucket
            accesskey: "{{ st2kv.system.s3.accesskey | decrypt_kv }}"
            secretkey: "{{ st2kv.system.s3.accesskey | decrypt_kv }}"
Reusing existing packs

Ideally it would be great if packs could plugin to this "artifact" architecture and provide input/output artifact actions that could be run. This would allow us to have pluggability and not reinvent the wheel or have to pull in code complexity for integrations within StackStorm core itself.

Long story short, this is just a cool thing i saw and wanted to write down my thoughts / usecase before i forgot it.

cognifloyd commented 5 years ago

Cool idea. So, then maybe there would be a new artifact plugin where you can register actions in a pack as artifact handlers?

I'm planning to setup a pulp project server (v3) to host a bunch of artifacts like release archives, RPMs, and wheels. That will involve writing a new pack to include it in my workflows. So, if an artifact plugin registers actions, then maybe that would be all that is needed.

But, if we wanted some special handling for files, maybe a more direct integration with Pulp would be good for StackStorm. Pulp is written in python, and built as a distributed architecture. At a glance, maybe some of the pulp components/nodes could be added to StackStorm to provide an artifact repository for workflows.

cognifloyd commented 5 years ago

Plus, it would be nice for sensors to be able to have access to some kind of artifact repository too, so that the key-value store isn't the only officially supported way to store intermediate sensor data in between sensor polls.

arm4b commented 5 years ago

Noticed that interesting concept of passing artifacts within the workflow from Argo when we looked at it a few weeks ago. This is a good feature request and use cases listed make perfect sense too :+1:

guzzijones commented 4 years ago

I would also like this feature. I also like the publish_artifact idea. essentially write it to disk with the filename as a unique hash. Then store the hash in the keyvalue store linked to the original filename.

guzzijones commented 3 years ago

I am probably going to start on this at some point soon. This is the last remaining piece of st2 that i see missing for use cases on our end.

  1. the client will need to be able to upload files to a storage location. The file can be given a unique hash and stored in the key value store as a lookup to the original file.
    1. clients will need a special file input type that tells the client to upload to storage.
  2. a key value pair can be saved in the data store and the key can be passed into the workflow for tasks to read the file.
  3. also add a self.publish_file for python actions
cognifloyd commented 3 years ago

We might want to take inspiration from the pulp project (no not pulp2, pulp3) which uses the djangostorages framework under the covers. Then, such artifacts could be stored in whatever storage mechanism makes sense. eg azure blob storage, gcp storage, s3. or even nfs or for an all-in-one install, the local file system.

guzzijones commented 3 years ago

I found this undocumented feature to upload an ascii file at least. It solves my use case. use a @ in front of the parameter name. [file upload]https://github.com/StackStorm/st2/blob/911e2e16d7a356df1bb3992bb9d06829db36ab05/st2client/st2client/commands/action.py#L831

arm4b commented 3 years ago

@guzzijones Could you please document your findings in respective https://docs.stackstorm.com/ section?

rush-skills commented 3 years ago

I am not sure it serves the original purpose, but being able to have ETL like connection + query at the start of a workflow would also add the possibility of doing initial data lookups (to get the list of action-items for the workflow) from external databases, which would then reduce the overall data handled in input/output of actions/workflows (if people decide to offload the heavy bits), thus might contribute to a speedup. This does add a great benefit to our use case as well.

chris3081 commented 2 years ago

Not sure where this has gone, but I can see a use case for installing packs, in fact I have that exact use case myself. Is anyone working on this currently? If so I'd been keen to assist so I can retire the current hack I have for installing from s3/https in stackstorm k8s with shared volumes.