gaia-pipeline / gaia

Build powerful pipelines in any programming language.
Apache License 2.0
5.2k stars 245 forks source link

Define repositories by configuration #187

Open DeanHnter opened 5 years ago

DeanHnter commented 5 years ago

Hi,

I read through the documentation for this feature but couldnt see any support for it, but i think it would be useful to have some kind of support for repositories configurations on start-up without using the UI.

When using the docker image it is a somewhat common use case to deploy and then tear down configurations resulting in having to redefine all the repositories containing pipelines manually, such as the url, language of pipeline etc. Having this in a config file and mounting it would be really helpful in this respect.

Skarlso commented 5 years ago

Agreed.

Skarlso commented 5 years ago

@DHunte To be fair... If you create them one, and save the DB plus the resulting cloned out code and copy it under a new instance, you would basically retain every information you previously set - up.

That said, I understand that copying over the checked out code and retaining the DB is not always a good choice especially if what you want is precisely to clear the db in the first place. It would be cool to just point at a location with a bunch of setup files that would configure a pipeline run for you.

Skarlso commented 5 years ago

@DHunte What would be a good configuration format you would love to see / use for this purpose?

I'm thinking yaml since, although there exists a lot of better things, it is something that Gaia is already using, and thus users would be more comfortable with that.

Also, I'm not looking to create something like Jenkins' pipeline Groovy files do. This is literally just defining pipelines. All the configuration you would setup on the UI translated into a config file.

/cc @michelvocks What do you think? Would that be okay?

Skarlso commented 5 years ago

Also, would this be something that is only done on startup, or would continuously watch a folder to see if a new configuration file has been added?

DeanHnter commented 5 years ago

@DHunte What would be a good configuration format you would love to see / use for this purpose?

I'm thinking yaml since, although there exists a lot of better things, it is something that Gaia is already using, and thus users would be more comfortable with that.

Also, I'm not looking to create something like Jenkins' pipeline Groovy files do. This is literally just defining pipelines. All the configuration you would setup on the UI translated into a config file.

/cc @michelvocks What do you think? Would that be okay?

Skarlso commented 5 years ago

@DHunte Thanks for your input! Yeah I'm sensing something like, if it's there, the configuration could change and be updated. In order to completely avoid or automate the creation / update of pipelines in Gaia.

Some of it can be done now via the API, but it's cumbersome. If there is a directory that Gaia can work out from, I'm thinking that might be preferred.

There are only a few things that can be edited anyways. Like, name or schedule. Other things are set in stone.

Skarlso commented 5 years ago

Started working on this.

michelvocks commented 5 years ago

Hi @DHunte and @Skarlso,

I'm wondering a bit about the use-case of this feature. I think the general idea and also the best advice is to copy / backup the data folder of your Gaia instance. If you are using the Gaia docker image, you can simply mount the /data path to your local path which allows you to store all the relevant data from Gaia on your host environment.

If you really want to wipe the database but still want to keep your pipelines, you can just backup the /data/pipelines folder or the content of it. Gaia is able to pick up already compiled pipelines without having a pipeline build history.

If all of that doesn't suit you, you can still use cURL (or any other tool) to send a HTTP request to the API from Gaia to trigger a pipeline build. That should be a one (or two) liner.

Do I miss something here? Would love to hear more about the real use-case. :hugs:

Skarlso commented 5 years ago

It can detect pre built binaries but without the db those are just executed if I remember correctly. You won't have any credentials with it like username password and private key. Those would be in the db. Also you wouldn't had things like, web hooks and github tokens. Basically anything that would be stored in the db which presumably or for some reason you don't have. Maybe it was comprised. Or just simply deleted.

michelvocks commented 5 years ago

Gotcha!

However, it is still possible to define all pipeline information in a JSON file and send that to the Gaia API via cURL which would basically provide the same functionality you described here.

I don't want to be the gatekeeper here but I want to make sure that the features we implement and we also have to maintain in the future, are valuable and provide functionality which makes sense. Once a feature is in the code base, it is really hard to deprecate and remove it.

Skarlso commented 5 years ago

I understand of course. And was also under the impression that coping over things is something that's absolutely doable. I guess it's up to @DHunte. What do you think?

DeanHnter commented 5 years ago

Hi all,

Thanks again for the breakdown.

After reviewing the options outlined, I would like to give my perspective on why I would still prefer to have a config file way to define the pipelines – though I understand and appreciate the minimalist approach with gaia as it’s the most complete, lightweight pipeline iv used.

The method of copying the database and/or its contents over seems to be a problematic way to version/update the pipeline definitions themselves, in that you cant just use something like a git-submodule or some other method of defining an alternate way of getting the latest defined pipelines easily and as such it is in contrast to git-centric development where you define your infrastructure via code/configs as opposed to folder and database backups which require extra setup steps.

The second method of curling json, I am okay with this approach however it leads to messy infrastructure also. To be specific, if you want to continually poll for changes to the pipelines themselves you need to create a docker side-car that curls periodically or if you run natively you need to deploy another application performing the same function. Again this approach is fine and appreciated but deploying a curling side-car/polling application is less than a complete solution from an end-user perspective.

From my perspective a benefit of the config file would be allowing other infrastructure tools such as ansible and terraform to build and deploy gaia locally allowing native builds on the bare-metal machine whilst also deploying containers as gaia workers for linux builds without the added overhead of having to fetch/restore files and folders to mount, or baking those settings from a previous deployment into a new docker image or deploying an extra side-car container into the cluster – again mostly trying to reduce the amount of steps required to setup the entire infrastructure.

Skarlso commented 5 years ago

Basically, this would give a faster/cleaner way of bootstrapping pipelines if you happened to have no Gaia running before, or you want a clean database start. The version migration didn't occur to me. That's also a good point.

The polling I'm giving a variable so it can be turned off, or rather, by default it's turned off. It's an opt-in basis.

For secrets I devised a plan where the Yaml would have a Vault placeholder. Like this:

...
credentials:
  username:
    plain: "bla"
  password:
    vault: "VAULT_KEY"
  private_key:
    env: "ENVIRONMENT_PROPERTY"
...

This would main, that you either define something plain text, or you used Gaia before which means you have a vault and would like to use that to store a credential, or you use an environment property which contains the secret.

Would that work?

michelvocks commented 5 years ago

Hi @DHunte,

thank you for taking the time to clarify this feature request.

The method of copying the database and/or its contents over seems to be a problematic way to version/update the pipeline definitions themselves, in that you cant just use something like a git-submodule or some other method of defining an alternate way of getting the latest defined pipelines easily and as such it is in contrast to git-centric development where you define your infrastructure via code/configs as opposed to folder and database backups which require extra setup steps.

The second method of curling json, I am okay with this approach however it leads to messy infrastructure also. To be specific, if you want to continually poll for changes to the pipelines themselves you need to create a docker side-car that curls periodically or if you run natively you need to deploy another application performing the same function. Again this approach is fine and appreciated but deploying a curling side-car/polling application is less than a complete solution from an end-user perspective.

I might misunderstand you here, but that sounds to me like you want to have something like the poll-functionality. We already have an integrated polling mechanism which looks for changes in your pipeline source code git-repository and if a change has been detected, your pipeline will be automatically rebuild and replaced with the newer version. That allows you to always have the newest version of your pipeline available. Is that what you are looking for?

From my perspective a benefit of the config file would be allowing other infrastructure tools such as ansible and terraform to build and deploy gaia locally allowing native builds on the bare-metal machine whilst also deploying containers as gaia workers for linux builds without the added overhead of having to fetch/restore files and folders to mount, or baking those settings from a previous deployment into a new docker image or deploying an extra side-car container into the cluster – again mostly trying to reduce the amount of steps required to setup the entire infrastructure.

Gaia was build, from the ground up, as a long running service. You can actually compare it with Jenkins, Gitlab or any other long running service. All these services produce some kind of database or content files which need to be backed up at some point. I'm still wondering about the use-case behind deploying Gaia locally and locally allowing native builds? If you want to test your pipelines, you can simply test them locally without Gaia.

I like the idea to use Terraform (or ansible) to manage your pipelines for Gaia. A separate Terraform module for Gaia would be really nice! πŸ˜„ That would allow you to define your pipelines in terraform scripts and automatically "deploy" them into Gaia.

Skarlso commented 5 years ago

Jenkins does provide pipeline files which essentially lets you describe jobs programatically. :) As does gitlab via .gitlab-ci.yml files which lets you describe jobs and pipelines and connections.

I don't think starting gaia is the goal per-say, but imagine that, you have Gaia, create a pipeline, then remote execute it with the remote service all from command line without ever touching the ui including creating the pipeline it self. And while I agree that this could be done via Curl, using curl is cumbersome and error prone. I would just drop a file somewhere and execute my pipeline after it has been successfully built.

At least that's who I imagine it. :)

michelvocks commented 5 years ago

Jenkins does provide pipeline files which essentially lets you describe jobs programatically. :) As does gitlab via .gitlab-ci.yml files which lets you describe jobs and pipelines and connections.

Isn't that the same you currently do with Gaia pipelines? When you write a Gaia pipeline, you define the jobs, the execution order of the jobs and other stuff like arguments. I really would love to see the difference but I don't see it πŸ™ˆ When you wrote a new Jenkins pipeline or a new Gitlab CI job YAML file, you still have to open the UI and tell Jenkins/Gitlab where the source code is located. That's identical to what Gaia does.

I don't think starting gaia is the goal per-say, but imagine that, you have Gaia, create a pipeline, then remote execute it with the remote service all from command line without ever touching the ui including creating the pipeline it self. And while I agree that this could be done via Curl, using curl is cumbersome and error prone. I would just drop a file somewhere and execute my pipeline after it has been successfully built.

At least that's who I imagine it. :)

That sounds to me like a Gaia CLI binary. A CLI which makes it easier for human beings to remotely build and start pipelines.

Skarlso commented 5 years ago

Hmm, you are right. Sorry, the Jenkins Pipeline was the whole pipeline not the execution of a binary. πŸ€”Same for Gitlab, for which have the... pipelines. :D I see your point.

danielBingham commented 4 years ago

So, I'll preface this with the disclaimer that I haven't read the whole thread, but I'm a Devops Lead at a startup that currently uses Jenkins, and I've been keeping my eye on Gaia for a while as a potential future replacement for Jenkins.

We have many, many pain points with Jenkins, but one of the biggest ones is managing the configuration of Jenkins itself. We end up having to backup all of its XML files and then load development Jenkins instances from those backups. Right now we store them on S3, though we've discussed checking them into the repo. Neither of these options is great.

What we really want is to be able to define the configuration of our CI/CD server in code in the repository. Doesn't matter if its YAML, JSON, or HCL, just as long as its manageable code and we can cleanly define the configuration of each of our pipelines, any user or security configuration (with a way to reference a secrets store), and any server wide configuration. Basically I want to be able to standup a complete server from the repo and I want to be able to track changes to it in the repo (like we do with all the rest of our infrastructure).

This gets a little inceptiony - because for all the rest of our infrastructure we use a pipeline to stand it up. So bonus points if there's a way to define an "initial pipeline" or something that runs against the CI/CD server instance and configuration on standup. Or bootstrap code or something.

I know there are several Jenkins plugins which claim to offer this, but I'm getting real tired of piling plugins on top of plugins for what should be basic behavior. So if Gaia offers this functionality out of the box, that's a huge tick in Gaia column for me when deciding whether to move off Jenkins to Gaia (you know, once Gaia's out of alpha).

Skarlso commented 4 years ago

The final say on Michel. I'm all for doing it. πŸ™‚

Skarlso commented 4 years ago

@danielBingham So as a compromise... What about Terraform? To build a terraform plugin for the api part which can replace the frontend effectively?

danielBingham commented 4 years ago

@Skarlso If you can define the configuration of the Gaia instance completely in terraform, and spin it up from the repository, that would absolutely solve my use case. We're working on migration to Terraform fully anyway, and I love the idea of just having a single configuration language to work in.

Skarlso commented 4 years ago

I'll have a think about this. This would probably mean some kind of API key / secret generation which could be used to access Gaia. And a user could rotate said keys.