Open nmaludy opened 5 years ago
Cool idea. So, then maybe there would be a new artifact
plugin where you can register actions in a pack as artifact handlers?
I'm planning to setup a pulp project server (v3) to host a bunch of artifacts like release archives, RPMs, and wheels. That will involve writing a new pack to include it in my workflows. So, if an artifact
plugin registers actions, then maybe that would be all that is needed.
But, if we wanted some special handling for files, maybe a more direct integration with Pulp would be good for StackStorm. Pulp is written in python, and built as a distributed architecture. At a glance, maybe some of the pulp components/nodes could be added to StackStorm to provide an artifact repository for workflows.
Plus, it would be nice for sensors to be able to have access to some kind of artifact repository too, so that the key-value store isn't the only officially supported way to store intermediate sensor data in between sensor polls.
Noticed that interesting concept of passing artifacts within the workflow from Argo when we looked at it a few weeks ago. This is a good feature request and use cases listed make perfect sense too :+1:
I would also like this feature. I also like the publish_artifact
idea. essentially write it to disk with the filename as a unique hash. Then store the hash in the keyvalue store linked to the original filename.
I am probably going to start on this at some point soon. This is the last remaining piece of st2 that i see missing for use cases on our end.
file
input type that tells the client to upload to storage.self.publish_file
for python actionsWe might want to take inspiration from the pulp project (no not pulp2, pulp3) which uses the djangostorages framework under the covers. Then, such artifacts could be stored in whatever storage mechanism makes sense. eg azure blob storage, gcp storage, s3. or even nfs or for an all-in-one install, the local file system.
I found this undocumented feature to upload an ascii file at least. It solves my use case. use a @ in front of the parameter name. [file upload]https://github.com/StackStorm/st2/blob/911e2e16d7a356df1bb3992bb9d06829db36ab05/st2client/st2client/commands/action.py#L831
@guzzijones Could you please document your findings in respective https://docs.stackstorm.com/ section?
I am not sure it serves the original purpose, but being able to have ETL like connection + query at the start of a workflow would also add the possibility of doing initial data lookups (to get the list of action-items for the workflow) from external databases, which would then reduce the overall data handled in input/output of actions/workflows (if people decide to offload the heavy bits), thus might contribute to a speedup. This does add a great benefit to our use case as well.
Not sure where this has gone, but I can see a use case for installing packs, in fact I have that exact use case myself. Is anyone working on this currently? If so I'd been keen to assist so I can retire the current hack I have for installing from s3/https in stackstorm k8s with shared volumes.
SUMMARY
Currently in a distributed StackStorm deployment, when running an action the node that the action is run on is random. This causes some headaches when trying to deal with files or artifacts when implementing things like a ETL workflow or CI/CD workflow.
ETL:
CI/CD:
The way this works now is:
ETL (Database query)
CI/CD (Files and Binaries)
ISSUE TYPE
IDEAS
Another workflow tool that i found has an interesting concept of Artifacts that it can be passed between steps in the workflow:
This spawned some thinking and relates to an idea i had in: https://github.com/StackStorm/st2/issues/4343
It would be cool if we could pass in "artifacts" as inputs/outputs associated with a task in a workflow. The task would perform some pre/post work to load/store the artifact around the action run.
Sudo coding it could look something like what i had in my other request.
ETL - Database
This would retrieve a database artifact from Mysql, do some processing, then publish the results back to Mysql.
CI/CD - Files and Binaries
This would run a build process that checks out a git repo, builds the thing, uploads the RPM to a Yum repo and uploads the build log to an S3 bucket.
Reusing existing packs
Ideally it would be great if packs could plugin to this "artifact" architecture and provide input/output artifact actions that could be run. This would allow us to have pluggability and not reinvent the wheel or have to pull in code complexity for integrations within StackStorm core itself.
Long story short, this is just a cool thing i saw and wanted to write down my thoughts / usecase before i forgot it.