Config Language Parsing

ChrisMcKenzie commented 8 years ago

As seen in the Design docs I envision a quite powerful config language which is built on top of HCL.

The language I envision will allow pipelines, tasks, and variables to be defined both inside a workflow and outside so that they can then be used in multiple workflows. also I dont want to confine configs to just one ".styx.hcl" file and would like the ability to include files so that workflows can be better organized. See Design Docs

@mike-marcacci I know you hate the current ecosystem of CI/CD tools so I think you might have a lot of input on this issue.

mike-marcacci commented 8 years ago

JUST getting a chance to look over these. I think HCL is a fantastic config language, but if you're hoping to allow substantial amount of custom logic, you may consider actually using a language like bash or python instead of a config language:

For example, I like the idea that habitat used bash as the requirement for their plan.sh syntax. I've got to head out right now, but I'll finish looking through this and add more comments later!

EDIT: to elaborate on the choice of whether to use a config (data) language vs. a scripting/programming language, I think there are a few questions to ask:

is the structure of the data important?
will we be manipulating or merging the data with data from other sources?
do we want to intentionally constrain the scope of the data?
do we want the data to be easily machine-written?

"Yes" is an argument for HCL or any config language, but "no" means you might consider something more flexible, especially considering how a CI tool by its very nature "runs things."

mike-marcacci commented 8 years ago

(Edited my above comment.)

Also, I'd like to throw out a couple links to some build resources that are very worth reading through. The idea of deterministic builds is extremely important, and almost never followed. It would be very cool to promote this idea in a generic way!

https://nixos.org/nix/ https://wiki.debian.org/ReproducibleBuilds

ChrisMcKenzie commented 8 years ago

Let me start by answering your set of vectoring questions.

Is the structure of the data important?

Yes, the structure of the data is important but I feel that structure can be preserved by both approaches in various ways.

To be clear the data we are speaking about is mostly metadata about a given portion of a build/deployment process (eg. Pipeline Name, Task Name). The actual code being executed will always be some sort of scripting language (ie. bash).

Take for example the following proposed HCL:

workflow "build-my-awesome-application" {
  pipeline "install-tools" {
     task "npm-install" {
        script = <<EOF
              npm install 
EOF
     }
  }
}

Might also be expressed in ruby like

workflow("build-my-awesome-application") {
  pipeline("install-tools") {
    task("npm-install") {
       `npm install`
    }
  } 
}

Ruby may be a bad example as it allows for dsl's like this, but I think it proves the point that a powerful language might also be able to accomplish this goal.

Will we be manipulating or merging the data with data from other sources?

Yes, I feel that pre-built/includable functionality is essential to providing a CI/CD tool with a low barrier to entry. For example having a function that will npm install your packages and manage the npm caches for you so that you aren't required to download the same version of a package over and over from its source would greatly improve performance and reproducibility.

Do we want to intentionally constrain the scope of the data?

I think that if a "scripting" language is used it then becomes less clear what execution scope we are working with because there is a sort of metadata and requirements scope to a CI/CD tool that is evaluated in order to decide what defined actions will be executed (eg Workflow, Pipelines) and then there is the actual build execution scope (eg. Tasks) which actually defined raw code to be executed in a "build" environment. With that being said I feel that clearly separating these concerns via the syntax helps with a more clear and understandable user experience

Do we want the data to be easily machine-written?

I am not sure I see the use case for having machine-written build definitions.

I feel that we are not building only building a system that just "runs things" but it must also know a little about what the user is trying to accomplish (eg. A e2e test can not be completed with out knowing that a build has been successful)

With that being said, while I do agree that using bash would be a more unique way to solving this problem I do think that a config language is needed for defining more of the more meta level pieces.

ChrisMcKenzie commented 8 years ago

I am just going to put this here as a bit of a half-baked idea but I was thinking of better ways of describing a "workflow"

# pipeline build
``
npm install -g
``

# pipeline test
**requires**
- pipeline:build:pass
``python
import styx

# execute a make task with python
styx.make('test');
``

The main idea here is that it allows the user to use any language they wish as well as also being both the metadata and documentation for building and testing an application.

@mike-marcacci I think this would be an elegant compromise for our debate. let me know what you think.

ChrisMcKenzie / styx

Config Language Parsing #2