gocd / gocd

GoCD - Continuous Delivery server main repository
https://www.gocd.org
Apache License 2.0
7.12k stars 973 forks source link

Feature: pipeline configuration from source control #1133

Closed tomzo closed 8 years ago

tomzo commented 9 years ago

This feature made it! The documentation in GoCD is here.

I have collected most notable information from the comments below so that no one has to read all that again to get an idea what has changed in the system and what is expected from it.

Overview of the feature

I have prepared example config repositories. In order of complexity:

  1. https://github.com/tomzo/gocd-main-config contains main cruise XML configuration. The one stored in /etc/go/cruise-config.xml
  2. https://github.com/tomzo/gocd-indep-config-part - XML configuration part with no external references.
  3. https://github.com/tomzo/gocd-refmain-config-part - XML configuration part that refers to pipelines from main.
  4. https://github.com/tomzo/gocd-refpart-config-part - XML configuration part with references to other configuration part repository
  5. https://github.com/tomzo/gocd-json-config-example - JSON configuration part

In main config https://github.com/tomzo/gocd-main-config there is config-repos branch with config-repo sections to import elements from the other repositories.

Domain and concepts

Configuration repository

Configuration repository is a source control repository of any kind that holds part of Gocd configuration. So far we referred to this as config-repo or partial. However 'partial' should really be reserved for the object of configuration. While repository is the remote code, yet to be fetched and parsed.

ConfigOrigin

Tells where some part of configuration comes from. It was necessary to add, because now some services need this extra info to operate. There are 3 types of configuration origins:

There are 2 scopes of configuration:

These are important at system level because we consider validity twice, first at base scope, then at merged.

Behavior and assumptions

When pipeline is defined in configuration repository, there are always 2 cases which actually define how Go server should behave.

When configuration repository that defines the pipeline is the same as one of materials

In automated builds we expect that when pipeline is triggered with material at revision C1, then configuration of the pipeline will be from the same commit - C1. There is a small (unavoidable) inconsistency here - when there are few quick commits (C1, C2, C3) made, that change pipeline configuration, then Go may pick them up faster than finishing already running builds (E.g. Configuration has been updated to C2, when stages on C1 are is still running). It may lead to failing a build that would have passed if the commits were slower. However IMO this is good after all, the quick commits usually would be done because somebody wanted to fix the previous configuration. There is no way to avoid it because only one pipeline configuration can exist at a moment.

In manually triggered builds Go always fetches materials first, which may change the configuration of pipeline that we just triggered.

In timer triggered builds Go also fetches materials first, which may change the configuration of pipeline that is being triggered.

This case is much less complex. Go is always polling for changes in configuration repositories and tries to merge them to current configuration. The rules are the same as if the incoming changes were done from UI.

Failures

Hung material

What happens when one material polling gets hung:

When plugin fails or configuration has invalid format or migration fails in configuration repo checkout then material update completes but config partial is old.

How to handle merging configuration parts and main configuration?

  1. Merges are done at object-level. (Meaning first all XML and all repositories are parsed to create BasicCruiseConfig and PartialConfig, then an aggregate object is created - BasicCruiseConfig with merge strategy)
  2. According to rules written below

    Environments

    Pipelines in environment

Most liberal approach possible:

Most liberal approach possible:

There could be optional overrides but we can consider it future work.

Pipelines

Authorization can be only in main xml so it cannot conflict when merging.

System

Some notes about changes in how Go services work and what is happening when configuration repositories are present.

Services

Here is a summary of new services layout:

Below GoConfigService

The best analogy to get the whole point here is that MergeGoConfig has replaced the old CachedGoConfig. It used to be that CachedGoConfig had 2 instances of configuration in memory (for edit and current config). Now there is MergeGoConfig that has these two. But main difference is that MergeGoConfig may return merged configuration as current config or for edit. If there are no extra configuration parts then it returns the main configuration.

Above GoConfigService

This is implemented mostly how we discussed here

New material update queue

Added new component - ConfigMaterialUpdater which listens on config-material-update- completed topic. So when MDU is done then ConfigMaterialUpdater gets its chance to work with material being updated:

The checkouts (in pipelines/flyweight) are NOT done/updated by standard material pollers when doing update on db (MDU).

But now there is new type of poller that creates full checkout on each update. These directories are now read and parsed by configrepo plugins.

Handling edits

Merged cruise config is returned for edits. When some service is editing the config it does not know if the config is merged or not. It does not have to know.

Adding

When method to add pipeline or environment is made then it reaches merged cruise config at some point. It is then aware that we meant to add in the main part and changes the main config instance (inside the merge cruise config instance).

Removing

Removing is like adding. We can localize where to remove from. If user tries to remove remote element then it fails. Usually it would fail in the cruise config code.

Modifications

Modifications get complex because there are many ways in which they are introduced. This is where there is real benefit from returning merged config instance. Changes are made on the config instance in full merged context so that when anything invalid is attempted then it will throw. E.g. when trying to change name of pipeline group defined remotely.

Saving changes

Each config edit ends with attempt to save some config for edit instance (or deep clone of it, or clone of a clone, etc.). To deal with that - magical writer is aware of possibility that merged config might be passed to be serialized. If so then it takes out only locally defined configuration elements. Actual extraction of local elements is implemented in config-api and it is very easy because we keep and maintain the main configuration instance inside merged config anyway.

Pull requests

These are either merged or planned pull requests to make all above work:

Being a big fan of keeping all project-related code in its source code repository I would really like to be able to declare pipeline configuration in the source code of each individual project instead of the global cruise-config.xml. Many people will agree that each project's code should know how to build and test itself. Following this concept it should also know how to CI-itself.

Problem

Currently when all go configuration is in global configuration file on server we basically end up with 2 sources of projects configuration - one being git repository, the other a file on go server. There are lots of scenarios when new changes in git repo will cause the build to break because they expected different pipeline configuration. Or rather pipeline configuration expected the older git repo contents.

Concept

In order to avoid such conflicts probably the <pipeline> section should never be in the global cruise-config.xml, instead go-server should configure pipelines after pooling from source repositories.

Final notes
tomzo commented 9 years ago

Configuration plugin extension point - plan

I am planning how to approach extension point for configuration plugins. Can you give me some feedback to this?

I browsed all existing extensions points and I think I get how they work. What worries me most is that each one requires to specify and handle a JSON map for each operation. Meaning that if config plugins are done this way then their extension point would re-define large part of config api to handle response body from parsing configuration repository.

Main points:

Solution 1 - use API based extension

I guess this your obsolete method and this is like package material extensions are implemented over ApiBasedPackageRepositoryExtension

This solution is simple. Least work required. As far as I understand plugins will break (fail to load) on server upgrade when config-api has breaking changes (e.g. class or method renamed, removed).

Solution 2 - version config-api classes

As mentioned already - java will not allow 2 versions of the same class to be loaded at the same time.

We can copy-paste* all (allowed in config extensions) configuration classes to new module go-plugin-config-api (or we just append to go-plugin-api). There would be packages v1, v2, v3, v4 for each config api class set. Each set is immutable once released (or non-breaking changes can be introduced to some api versions).

Then plugin developer can use selected version of config without referencing the config-api used by server.

*we can skip copy-paste step and just do api versioning in the config-api module

With JSON

There can be JSON-based communication between the plugin and server. But we would not hard code the map names, we would expect Gson to auto-generate the mappings. I think it should behave nicely when some extra field is added.

Without JSON

Also another variant is that we let server be aware of all config-api versions. We still create API-based plugins (no JSON) and plugin has to tell server which api version it returns from config. Then server has to handle the missing elements (aka migration) but at object level. E.g. It would have to be able to migrate class PipelineConfig_1 to PipelineConfig

Questions

arvindsv commented 9 years ago

I don't think solution 1 or 2 will be good. Mainly because of versioning. All new endpoints are moving from Java to (or are created only using) JSON, because versioning is easier, and is not necessarily a breaking change. Package-level versioning is quite tedious as well (especially because of all the duplication needed).

All the current JSON messages should be considered "version 1" and any changes to them should introduce a version field, and change its value to 2.

The way I see the extension point implemented is to have a JSON representation of the new PartialConfig class. To make it easier for both sides (Go and plugin authors), and to reduce re-engineering as you called it, we can provide the config-api or a part of it to be used by the plugin authors, so that they can generate their JSON out of it - a "stub" in the old bad world of WSDL. But, what goes "across the wire" (well, from plugin to server) should be a JSON, which is defined clearly and can be versioned.

One other advantage of having it as a JSON is the ability to have it be generated by something other than a Java process. One of my hidden thoughts about JSON communication is the ability to move plugins out of the Go Server process, into their own processes. Going back to a Java API, rather than a message-based one makes that hard to do.

I think not tying it directly to config-api classes, and doing some deserialization by hand allows versioning to become slightly easier. As we talked about earlier, we should try and make addition-only changes to config from now on, so that versioning is automatic, but if we need to make a modification change, then having the message directly tied to the Java class, makes it as inflexible as having a Java object going back and forth between the plugin and the server.

Finally, yes, the JSON schema will be a little big, but that is what it is. It is quite hierarchical and actually, a plugin author doesn't really need to recreate classes for the whole tree. The JSON message can be built up incrementally, for instance, as files are read by the plugin. Using maps, instead of classes.

I think what I talk about here is similar to your "With JSON" section. All the other solutions mentioned are variants of class-based APIs, with different ways of versioning.

tomzo commented 9 years ago

Thanks. I definitely hear you about JSON advantages I just couldn't quite believe that this large schema should be done now. Then I guess this extension should be like any other latest JSON ones. Only difference being that schema is larger. I will plan this further, prepare the schema and publish here before I start implementation. But what do you think about auto-generating JSON schema from java classes? Current extensions define each key name and sections by constant string somewhere in the go-plugin-access.

arvindsv commented 9 years ago

But what do you think about auto-generating JSON schema from java classes? Current extensions define each key name and sections by constant string somewhere in the go-plugin-access.

Yes, I think auto-generation should be ok. Do you mean auto-generation of JSON schema or the JSON message itself?

I was talking about "Java class to JSON message", when I mentioned making a part of the config-api module available to the plugin author. If we start publishing config-api module, along with go-plugin-api, that might be useful. However, versioning that is still an issue. We will need to have version-specific Java classes for it. If we are saying versioning will be taken care of elsewhere (by only adding fields, for instance), then sure. I'm all for making it easier for plugin authors to generate the JSON message.

However, do you really mean generating the schema from the classes?

tomzo commented 9 years ago

I mean "Java class to JSON message" - only to reduce work around converting between java instance and message. But I guess that in the end we would have to generate some schema or examples from it for the documentation.

arvindsv commented 9 years ago

But I guess that in the end we would have to generate some schema or examples from it for the documentation.

Right. Examples are fine, if the schema itself is too complicated to generate.

tomzo commented 9 years ago
public class Environment
{
   private String name;
   private Collection<String> pipelines;
   private Collection<String> agents;
   private Collection<EnvironmentVariable> agents;
}

Gson should generate

{
    "name" : "dev",
    "environmentVariables" : [
       {
         "name" : "var2",
         "value" : "plaintext"
       },
       {
         "name" : "secret1",
         "encryptedvalue" : "34i37543"
       }
    ],
    "agents" : [ "uuid1" ],
    "pipelines" : [ "pipeline1", "pipeline2" ]
}

Versioning

I think not tying it directly to config-api classes, and doing some deserialization by hand allows versioning to become slightly easier. As we talked about earlier, we should try and make addition-only changes to config from now on, so that versioning is automatic, but if we need to make a modification change, then having the message directly tied to the Java class, makes it as inflexible as having a Java object going back and forth between the plugin and the server.

You are right about not tying to config-api, there is too much content in there anyway. But I'd like to have some classes to represent the data contracts. What you say about manual deserialization in context of migration flexibility makes sense. It will definitely work in current extensions where messages aren't that big. But doing a manual deserializations of 10-level deep structure without declaring any types on the way feels wrong to me. Plus there is re-engineering I mentioned, It would be nice to give plugin developer some tools to handle message preparation.

Here is how I how I image dealing with version changes of the data contracts:

For a moment I considered Gson Since annotation to handle versioning. But I do not see benefits of it. It would not handle major changes anyway and minor can be handled like I described above.

Questions

arvindsv commented 9 years ago

I'd like to declare helper methods in go-plugin-api to be used by server or by plugin for convertions between JSON string and above mentioned java classes. This requires to reference Gson from the plugin-api.

One problem I see is that the go-plugin-api JAR will no longer be usable independently by plugin authors, since there will (probably) be a load-time dependency on gson. This will happen to plugin authors, whether they care about the config extension point or not. Right?

This might happen anyway, if you're putting the configrepo package inside go-plugin-api, since gson's annotations will need gson as a dependency. That could be a bit of an issue.

Do you think it makes sense to make it a different module, which is published independently of go-plugin-api? Then, config extension point plugin authors can include go-plugin-api and this new module (say, go-plugin-config-repo). The new module will bring along a gson dependency, which is alright, since the go-plugin-api module doesn't have that dependency.

Versioning

I agree that manual deserialization of a 10-level deep structure will be hard, and the approach you suggest makes sense. I consider it a stub approach, since the actual contract is still the JSON. Plugin authors do not need to use these classes. I still think having these stub classes in a different module from go-plugin-api is better. Apart from the reason I mentioned earlier, it doesn't give the false impression that the classes are part of the interface.

Questions

including encrypted secrets in configuration repository - is it possible to generate encrypted value by server so that it can be commited to config repo?

Hmm. We might have to provide a way for the plugin to provide a value, which will be encrypted and given back. Otherwise, we will have to provide an admin-level API to do that, so that users can do it themselves through scripts, right? The values will be encrypted, if provided in an unencrypted form, but I guess you're talking about it being encrypted while in the config repository.

does lock explicitly option in pipeline config make sense when pipeline is defined in repo? we cannot assign this field in scm source. I planned on solving this by storing it in memory of some service.

Do you mean the "isLocked" option, or the lockExplicitly() method? It looks like the lockExplicitly() method is just used in tests. I don't even know why. If not that method, what is the problem with setting the isLocked option? It's to indicate that only one instance of the pipeline should run at one time.

what is ArtifactPropertiesGenerator (in JobConfig) ?

I'm not entirely sure, but I believe it has to do with this. To be honest, I have used this feature only rarely (during performance tests). I mentioned on one of the mailing list posts earlier that properties should be re-thought. But, I think ArtifactPropertiesGenerator deals with that.

tomzo commented 9 years ago

Do you think it makes sense to make it a different module, which is published independently of go-plugin-api? Then, config extension point plugin authors can include go-plugin-api and this new module (say, go-plugin-config-repo). The new module will bring along a gson dependency, which is alright, since the go-plugin-api module doesn't have that dependency.

Yes. This makes sense. We make it optional to use these classes and helper methods while actual contract is JSON. I'll create a new, optional module with dependency on Gson. I'll use it from go-plugin-access and plugin developers can use it if they are lazy.

Do you mean the "isLocked" option, or the lockExplicitly() method? It looks like the lockExplicitly() method is just used in tests. I don't even know why. If not that method, what is the problem with setting the isLocked option? It's to indicate that only one instance of the pipeline should run at one time.

I got confused by lockExplicitly() setting the configuration field isLocked to true. That seem like configuration is another datastore for keeping lock state (apart from db). But I understand now that it isn't. It's just a weird method name and test. No problem in declaring isLocked in config repo then.

tomzo commented 9 years ago

[I forgot to respond to secrets parts.]

Hmm. We might have to provide a way for the plugin to provide a value, which will be encrypted and given back. Otherwise, we will have to provide an admin-level API to do that, so that users can do it themselves through scripts, right? The values will be encrypted, if provided in an unencrypted form, but I guess you're talking about it being encrypted while in the config repository.

I meant that nobody will want to commit plain text secrets in the config repository. Some easy to do solution would be to provide plain text value to Go server via web UI, server encrypts that and displays an encrypted value for user to commit in repo. It could be an API request as well. Is there some easy way to do that now?

arvindsv commented 9 years ago

Is there some easy way to do that now?

No, there isn't. This hasn't been needed till now, since Go was doing that internally. I think an admin-level API to do this should work.

tomzo commented 8 years ago

I have been using a configuration repositories on few pipelines in last weeks. I wasn't using the extension point though, knowing that JSON will change. I'll contribute next few days to finishing the configrepo extension point and fixing some issues that I discovered over last few weeks. I don't like to jump straight to work before announcing what I am planning to do so please review my plans and thoughts:

Bugs to fix:

I'll put these into separate issues to do later or somebody to pick up:

Some things worth considering:

It works, but it is soooo ugly... Any suggestions on what to do with it? Maybe I should remove the stubs and use contract classes with gson to handle serialization/deserialization. The problem is then managing the changes in configuration schema. But perhaps it is not that bad considering that extension point is in control of what is the valid JSON. There is an extra step later that handles the convertion from contract classes to actual pipeline configuration.

@arvindsv I recall you mentioned improving plugin infrastructure so that plugins can run in dedicated process. Seeing that @xli has finished work on websockets communication makes me think how far is that?

Hopefully when I am done, the extension point will be much closer to production-level quality. I'll finish by improving the JSON config plugin and I'll probably write a YAML plugin considering that YAML is superset of JSON, that should be just as easy.

I'd love to see some community built around configuration plugins, so that I'm not the only author of those. Considering that many organizations are running something similar internally, it seems possible. Any suggestions/help on that?

tomzo commented 8 years ago

I'll resign from making API and JSON in extension similar. Instead I will focus on handling validations of the messages. I will deviate from official schema also. E.g. There will be no pipeline group with collection of pipelines because it is error prone, instead I'll add group name as mandatory element of pipeline object.

tomzo commented 8 years ago

Draft of how I see first release of the extension point https://github.com/tomzo/documentation/commit/190c54642e9af98ec38cfda8071496174b5eaa3c

arvindsv commented 8 years ago

@tomzo:

I'll create UI PR. It is missing grayscaled icons, rest is finished. I was using it long enough to see that is working fine. I'll put details there.

/cc @naveenbhaskar when you have this ready, so he can help you get the icons you need.

arvindsv commented 8 years ago

The first set of thoughts and comments in your comment above makes sense to me. The only changes to those are that you decided not to try and sync it to the pipeline creation JSON (which is fine, as long as it is documented well here). I think it is up to the endpoint to decide what is right. Also, otherwise, every time the JSON changes on the server side, the plugin might need to change.

About the bugs section:

  1. Saving main config and merged config: @jyotisingh has more context and you two have talked about it, I believe. So, I won't think too much about it at this time.
  2. About agent registration bug: Ok. If you reproduce it and need help, let us know.
  3. About config parse errors showing on dashboard: Yes, at this time, config errors cannot happen at all. So, they're never shown on the dashboard. Since they can now happen (because of the repo), we will need to show a server health message, I guess.

I'll put these into separate issues to do later or somebody to pick up ...

Alright.

[More responses to come in a minute ...]

arvindsv commented 8 years ago

Some things worth considering:

implement automated integration test with config repo plugin. Any ideas how to approach it?

Are you thinking of an integration test for a specific plugin, or for the endpoint itself (with a sample plugin)? Either one is possible. The actual endpoint mechanism (the communication mechanism) and registering of plugins are integration tested and the rest of the endpoints unit test themselves and depend on the communication to be integration tested. If you're thinking of a full end-to-end test, we can think about what we need to introduce for it. I'm thinking of the value of those tests.

There is repetition in extension point because of: public config-repo "stub" classes which are used for serialization/deserialization. E.g. NantTask stub ... It works, but it is soooo ugly... Any suggestions on what to do with it? Maybe I should remove the stubs and use contract classes with gson to handle serialization/deserialization. The problem is then managing the changes in configuration schema. But perhaps it is not that bad considering that extension point is in control of what is the valid JSON. There is an extra step later that handles the convertion from contract classes to actual pipeline configuration.

Yeah, I was thinking of that as well. Removing that layer of deserialization and serialization completely and depending on gson. My ideal approach would be handle it how Clojure handles JSON, and creates hash maps for it. These are not objects and transformations are still possible. You don't get some of the type guarantee, but that should really be handled using some kind of a schema.

While writing the first JSON endpoint, we considered using json-schema and using that to validate the JSON, but it didn't work out very well. @jyotisingh might remember why.

The migration is going to be a problem. If JSON is old (v1) and contract classes are new (v2), what happens and where does the migration happen? We need a JSON transformation thing which works without converting to objects. In Go's config, the XSL transformations do that work. That prevents Go from needing to have object hierarchies for every schema version change. Something like jolt?

@arvindsv I recall you mentioned improving plugin infrastructure so that plugins can run in dedicated process. Seeing that @xli has finished work on websockets communication makes me think how far is that?

That work hasn't been picked up yet (or even scoped out much - I haven't written by thoughts out in an issue, for instance). It's possible, but it's not really happening right now. There are three main plugin-related pieces of work that need to happen:

  1. Plugin registration outside of Go Server (websocket ... will imply different process and provide that for free).
  2. Multiple endpoints handled by a single plugin (needs some change to request of headers of plugin JSON messages).
  3. Ability to install plugins from within Go itself (a plugin browser within Go, I guess. Some changes in infrastructure needed to make this happen - having point 1 above will help too).

I'd love to see some community built around configuration plugins, so that I'm not the only author of those. Considering that many organizations are running something similar internally, it seems possible. Any suggestions/help on that?

Sure. What kind of help would you like from me? I think, if you write the JSON plugin and leave it at that, rather than writing too many plugins, it would encourage others to write their own, rather than depend on you. Improving the documentation, making it easier for someone to write a plugin would also help. I see your point about external (non-java, outside of process) plugins now. Having those would reduce the barrier for someone writing a plugin as well. That's what I can think about. If you're thinking something else, let me know.

Given all the work you've done, I feel that I need to work on whatever I can to make life easier for you. :) If that means working on out-of-process plugins, then I'm willing to put in the effort to make that happen, over evenings or weekends.

tomzo commented 8 years ago

@arvindsv

About integration test with plugins.

I was thinking about having a sample configrepo plugin. Test cases that, I think, would be worthy adding are:

I think end-to-end tests would be way to costly to build without much to gain from them.

About config parse errors showing on dashboard

Yesterday I have added messages about invalid configuration repositories. As user, only recently I discovered that it is quite important that these messages contain detailed information about all that is wrong, what it is and in which configuration repository. Consider such use case: There is repository with 20 pipelines declared in yaml. When somebody pushes changes with invalid config (which can be yaml syntax, or domain error, or just plugin can't handle it and crashed) then "server health messages" should say something like:

Parsing configuration repository using Plugin json.config.plugin failed for material: URL: https://github.com/tomzo/gocd-json-config-example.git, Branch: master [Jan-17 21:22:49] File pipeline13.json has syntax errors

Moreover, just like in config-api classes, the error reporting system should be rich enough to be able to list all problems that are present in configuration repository. E.g.

Parsing configuration repository using Plugin json.config.plugin failed for material: URL: https://github.com/tomzo/gocd-json-config-example.git, Branch: master [Jan-17 21:22:49] File pipeline6.json

  • pipeline is missing name
  • pipeline has no stages File pipeline2.json has syntax errors

Otherwise if only 1 errror is reported at a time then user has to push fixes one by one which can take a long time.

I am talking about this because it is highly related to handling parsing the configuration and migration that you also mentioned. Currently it would have to be that ugly code to handle these errors...

JSON handling

We need a JSON transformation thing which works without converting to objects

Here I agree 100%. If we can do that then problem is solved.

I will likely remove the classes with _1 suffix that are used to handle serialization now. Then handling serialization must happen elsewhere.

These are approaches I am considering since yesterday:

Manually on contract classes with gson help

For example constructor of ParseDirectoryResponseMessage (returned from plugin) would look like this:

public ParseDirectoryResponseMessage(JsonElement responseElement)
    {
        JsonObject response = responseElement.getAsJsonObject();
        if(response == null)
            // then error
        for (Map.Entry<String,JsonElement> elementEntry : response.entrySet())
        {
            switch (elementEntry.getKey()) {
                case "pipelines":
                    pipelinesJsonArray = elementEntry.getValue().getAsJsonArray();
                    if(pipelinesJsonArray == null) // plugin response is invalid
                        unexpectedElementType("pipelines should be an array");
                    // ... then each pipeline created from json element from array...
                        new CRPipeline(... json element instance here..)
                case "environments":
                    // ...
                default:
                    unknownElement(elementEntry.getKey());
            }
        }
    }

The benefit of this is that we create contract classes straight from json body. So we can check at object construction time if enough information is available and collect detailed errors on the way. When contract class changes, then this manual deserialization code must be updated also. I saw that this is somewhat how other extension points do deserialization. By exploring map.

Less is more approach

Throw away both _1 classes and contract classes. Or actually make contracts more open. Currently the contract classes are fully, strong typed java classes with fields very similar to ones in config-api. So at any time extension point returns configuration, you know exactly what is the maximum "capacity" that can could have been parsed from configuration repository.

In this approach I say

  1. Don't restrict content returned by extension point at all. So extension point returns a map instead of rich domain of class instances.
  2. Handle transformation from maps into config-api classes in server.

To better explain what I mean consider such example.

Today Go server version is 16.2; User has configuration repository with pipeline defined in yaml like this:

name: pipe1
label_template: "$COUNT"
user_content: This is something Go config will never understand
elastic_agents_thing: This is something Go will understand in 16.3
stages: [
{
   ...
}
]

Then plugin is very simple, thin layer (like the JSON one that I created).

It just collects all yaml files and puts content of every *pipeline.yaml to pipelines element in JSON response, possibly without checking the actual content of yaml elements:

{
   "pipelines" : [
      {
          "name" : "pipe1",
          "label_template" :  "$COUNT",
          "user_content" :  "This is something Go config will never understand",
          "elastic_agents_thing" : "This is something Go will understand in 16.3"
      }
   ]
}

Then extension point only checks that this is valid json - using gson. Then "parsed configuration" is returned to server as a map instance. Then in server we handle:

I propose to just ignore or warn about elements that are not understood by Go (yet). You can see how this allows to smothly upgrade configuration repository content before upgrading server.

Moreover we can require that configuration map contains at least

{
   "target_go_version" : "16.2"
}

To help Go handle unknown elements.

What do you think about this?

Migration

I think we need to focus on what "migration" scenarios shold be actually handled. I don't think that exactly the same rules apply here as in cruise-config.xml. The situation is that

The significant case is when Go server config-api has changed, which forces contract to be changed. But it does not necesarily force expected JSON to be changed. Only when there is not enough information returned by config plugins then it would be necessary to require them to return more content. There is nothing we can do to magically add more data when user configuration in repository does not have it. When go is upgraded, the configuration repository has to be removed or more data must be added to it. Moreover in some cases plugin needs to be updated also.

Last thoughts

Seeing that you mention Clojure. Do you think writing serialization/deserialization in Scala is an option? functional style, using expressions rather than objects? I am not an expert, but it might solve some problems.. and it might be fun.

Plugins and community

I wasn't trying to presure you into changing plugin infrastructure. However It was mentioned by few people that it is rather hard to write plugins now. So maybe it is something to consider. I probably could benefit something from out-of-process plugins, but honestly, the real benefit is just having the configuration repositories. What I meant is just spread the news - when you see somebody doing find and replace in cruise-config.xml on git hook, or meta-pipelines or another dirtyness, tell them that now there is cleaner way to do it.

tomzo commented 8 years ago

I am starting to like the Jolt idea. It seems any operation on JSON can be done with it.

Complete processing of the message could look like this:

  1. Check target_go_version in message. If it is older than current then apply Jolt spec to migrate the message. Do this until target version is current
  2. Deserialize message onto contract types.
  3. Validate contract by checking all required fields and generating all error messages on the way.
  4. Return contract instance (errors and configuration) to server.
  5. Server converts contract to config-api classes. Or puts the error messages on health dashboard.
arvindsv commented 8 years ago

Yesterday I have added messages about invalid configuration repositories. As user, only recently I discovered that it is quite important that these messages contain detailed information about all that is wrong, what it is and in which configuration repository.

Sounds like we need a page, rather than a detailed message? Per config repo.

Re: Less is more approach

In this approach, which component is responsible for understanding the response from a plugin? I see you're talking about validation on the server side. But, isn't some kind of contract ensured between the plugin-access layer and the server? It'll need to be. What is that format?

The part I'm worried about is the migration that needs to be handled on the server side. I'd expect that to be a plugin-access concern. Your jolt-related comment above might help with some of that. I'd rather we migrate as soon as possible, rather than leaving it till later.

Re: Migration

Only when there is not enough information returned by config plugins then it would be necessary to require them to return more content. There is nothing we can do to magically add more data when user configuration in repository does not have it. When go is upgraded, the configuration repository has to be removed or more data must be added to it. Moreover in some cases plugin needs to be updated also

Right. We should try and ensure that we migrate within the endpoint or server, as far as possible, to reduce the need of the plugins or user to change anything. That's the approach we take with the config as well (trying to have good defaults).

Re: Last thoughts

Though I personally would like to have other JVM languages in there, and I'm big fan of Clojure, from a code perspective, I think it'd make it hard. :( I would love scala plugins though, but having a part in core just for deserialization might be a little much.

Let's try jolt. If it doesn't work, and if we (I mean, you) really think that Scala is the right way to go, then let's create a separate project for it in gocd org or gocd-contrib org, and treat it as an external dependency we control. That way it can grow independently and we can treat it as a JAR.

Re: Plugins and community

What I meant is just spread the news - when you see somebody doing find and replace in cruise-config.xml on git hook, or meta-pipelines or another dirtyness, tell them that now there is cleaner way to do it.

Of course! I always try and do that. You don't know how much I'm waiting for this to be complete, so I can tell everyone this is the way to go. :) I always sing praises of you and all the work you're doing too. I talk to enough people with too much in their config, and will definitely mention this.

tomzo commented 8 years ago

Sounds like we need a page, rather than a detailed message? Per config repo.

Maybe, but that is rather a future enhancement. Currently I am finishing updating error handling. Which previously would have thrown exception on first problem during deserialization. Now it collects all errors first and then creates a multi-line string which could fit in searver health messages. The point is that single, user change in configuration repository may introduce more than one error and currently Go server is the only place to get feedback about it, after pushing. So it is important that user can see as much of the errors as possible at once, to fix them all in single commit.

I'll paste a screenshot soon with end result of this refactoring.

Less is more

In this approach, which component is responsible for understanding the response from a plugin? I see you're talking about validation on the server side. But, isn't some kind of contract ensured between the plugin-access layer and the server? It'll need to be. What is that format?

This class would understand the complete format. It is helper class to ConfigRepoPlugin which is bridge between the MDU and config services, etc. and the extension point.

Look at what it does now. It converts from one domain model defined in contract classes (CR*) to very similar one defined by config-api classes. It is kind-of ugly because of similarity.

The contract classes would be much smaller. Currently it is something like

public interface ConfigRepoExtensionContract {

    CRParseResult parseDirectory(String pluginId, final String destinationFolder, final Collection<CRConfigurationProperty> configurations);
}

public class CRParseResult {
    private Collection<CREnvironment> environments = new ArrayList<>();
    private Collection<CRPipeline> pipelines = new ArrayList<>();
    private List<CRError> errors = new ArrayList<>();
}
// CRPipeline and all its children... 20-30 classes

Instead contract could look like this

public class CRParseResult {
    private Collection<Map> environments = new ArrayList<>();
    private Collection<Map> pipelines = new ArrayList<>();
    private List<CRError> errors = new ArrayList<>();
}

The part I'm worried about is the migration that needs to be handled on the server side. I'd expect that to be a plugin-access concern

Probably migration should be handled in plugin-access then. Using jolt, json to json.

I am not going to implement this right now. I am keeping the strong-typed contract since the ConfigConverter is already written anyway. It is some approach to consider though, it seems like a complete solution without as much code as now.

I personally would like to have other JVM languages in there, and I'm big fan of Clojure, from a code perspective, I think it'd make it hard. :( I would love scala plugins though, but having a part in core just for deserialization might be a little much.

I thought so. Probably too wild for just this task.

arvindsv commented 8 years ago

Maybe, but that is rather a future enhancement. Currently I am finishing updating error handling. Which previously would have thrown exception on first problem during deserialization. Now it collects all errors first and then creates a multi-line string which could fit in searver health messages.

Oh yeah, definitely a future enhancement. What you have is fine. I understand the need to show all errors, rather than just the first one.

Probably migration should be handled in plugin-access then. Using jolt, json to json.

Yeah, if we can do that ... I think it removes a lot of cruft and unnecessary classes. Sure, keep the strongly typed one for now. I'm thinking of the right seams for this. Re-thinking the migration in the plugin-access layer: I think now, that as long as the migration and conversion from JSON to config-api objects is done around the extension point (near ConfigRepoPlugin and ConfigRepoExtensionContract), I think that's fine.

Any time the config-api classes change, then these classes would change, and there's some isolation between the server and this. So, I think it's fine. If we reduce the number of layers by using jolt, that might make it easier to manage for everyone, that's all.

willejs commented 8 years ago

@tomzo whats the eta on this being closed and released? Is there any accompanying documentation to go with it? Looks really good!

tomzo commented 8 years ago

There are some unmerged docs here

No date particular, I am afraid. Most work is done, 2 more PRs need review and we need to resolve some bugs still. I am running a server from my fork anyway. If you'd like to try it then some instructions are in https://github.com/tomzo/gocd/issues/1

rajeshwaranmsc commented 8 years ago

hi there,

Looks lot of discussions happening here, thought of asking my question here. We stepped in micro service based architecture for our new project and we have a repository for every microservices. And we using go-cd pipeline and created a pipeline for every microservice. And looks like we will run into many pipelines if we follow our current approach.

I also tried to look for creating generic pipelines where we can pass the material as parameter, and no luck.

Any suggestion how effectively we can organise such many source control ?

miroswan commented 8 years ago

I'd be verify happy if file-based pipeline configuration was a feature slated for the near future.

arvindsv commented 8 years ago

Getting closer. Few more edge cases discussed between @tomzo, @jyotisingh and me here. Will summarize soon and merge more changes in and get closer to finishing this!

cintiadr commented 8 years ago

This is a feature I'm waiting for a long time! Really good to see it showing up, I can't wait to be able to use it! I'd like to make some comments, and as I cannot comment on google docs, I will leave the comments here.

By the google docs and demo video, it seems like that the plan is having one single repository for all pipelines configurations. It would be particularly good if we could define multiple repositories (e.g., one per product/team), so each team would be responsible for modifying their own builds, not having writing permissions to change other builds. If I wanted to restrict creation of new pipelines, I'd implement some git hook instead (probably not GO responsibility on that). Sure, you cannot redefine the same pipeline in multiple locations!

I really like the fact that the edit button is grayed out; but I'm not sure if having all the builds as code (instead of having a mix of manual and coded) is for everyone. I particularly love having everything as code, but I don't know if every team would be so eager on that.

Also, as a user of build config as code, I don't want to rely on templates created by the UI; one very nice feature would be the ability to define templates in code.

Or, at least, a way of including macros/include/expanded stanzas. E.g., assume there's a common file which can somehow define jobs/stages or something you want to replicate across pipelines. A %%include 'java-8-compile-job' would provide some reusable job definition.

I don't have a lot of experience with templates to understand if templates or macros would be more useful, but I'd like to reuse bits and pieces across different pipelines.

arvindsv commented 8 years ago

@cintiadr:

By the google docs and demo video, it seems like that the plan is having one single repository for all pipelines configurations.

The doc does mention multiple config repositories. I've updated the image to show multiple repos too, just to make it clear that it's possible.

About templates, etc. it depends on the plugin I guess. If you have that supported by a plugin, then GoCD's template mechanism won't really be necessary. If not supported by a plugin, even a pre-processor can be considered, I suppose.

tomzo commented 8 years ago

I am closing this because core feature is done and released in 16.7.0. User documentation is here and here is XML reference

There are 2 plugins available:

We should open smaller issues with enhancements and bugs as they come along.

@cintiadr if you are interested in discussing templates see https://github.com/tomzo/gocd-yaml-config-plugin/issues/2

ketan commented 8 years ago

Woot!

On Sat, Jul 30, 2016, 3:52 AM Tomasz Sętkowski notifications@github.com wrote:

I am closing this because core feature is done and released in 16.7.0 https://www.go.cd/releases/. User documentation is here https://docs.go.cd/current/extension_points/configrepo_extension.html and here is XML reference https://docs.go.cd/current/configuration/configuration_reference.html#config-repos

There are 2 plugins available:

We should open smaller issues with enhancements and bugs as they come along.

@cintiadr https://github.com/cintiadr if you are interested in discussing templates see tomzo/gocd-yaml-config-plugin#2 https://github.com/tomzo/gocd-yaml-config-plugin/issues/2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gocd/gocd/issues/1133#issuecomment-236308645, or mute the thread https://github.com/notifications/unsubscribe-auth/AAApZrpylxAiYaiEuubvkUfmo9eLcFpaks5qan0sgaJpZM4Ec0G_ .

cintiadr commented 8 years ago

Thank you!