gocd / gocd

GoCD - Continuous Delivery server main repository
https://www.gocd.org
Apache License 2.0
7.05k stars 973 forks source link

Define global/plugin SCMs as a part of the pipeline JSON/YAML #5694

Closed ankitsri11 closed 5 years ago

ankitsri11 commented 5 years ago

EPIC: https://github.com/gocd/gocd/issues/5175

Currently, global/plugin SCMs configuration is defined as part of cruise-config.xml. User defining pluggableSCM in config-repo depends on this definition.

For example:

SCM configuration in cruise-config.xml

<scms>
    <scm id="4937e2fe-be38-46ae-a8cc-65e38c68c15e" name="pull-request">
      <pluginConfiguration id="github.pr" version="1" />
      <configuration>
        <property>
          <key>url</key>
          <value>https://github.com/gocd/gocd.git</value>
        </property>
      </configuration>
    </scm>
  </scms>

PluggableSCM definition in config-repo pipeline

{
  "scm_id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
  "destination": "destinationDir",
  "filter": {
    "ignore": [
      "dir1",
      "dir2"
    ]
  },
  "name": "github.pr",
  "type": "plugin"
}

User Experience:

For example:

{
  "materials": [
    {
      "type": "scm-plugin",
      "pluginID": "github.pr",
      "configurations": {
        "url": "https://github.com/gocd/gocd.git"
      }
    },
    {
      "url": "http://my.git.repository.com",
      "branch": "feature",
      "filter": {
        "ignore": [
          "externals",
          "tools"
        ]
      },
      "destination": "dir1",
      "auto_update": false,
      "name": "gitMaterial1",
      "type": "git",
      "shallow_clone": true
    }
  ]
}

Note - GoCD needs to handle if two pipelines have the same SCM configuration. Should error out something like below:

Cannot save SCM, found duplicate SCMs. pull-request, feature;; Cannot save SCM, found duplicate SCMs. pull-request, feature;;

/cc: @arvindsv @ibnc

arvindsv commented 5 years ago

@ankitsri11 Looks ok to me. This mostly involves a change to https://plugin-api.gocd.org/current/config-repo/#parse-directory and a format_version change.

/cc @tomzo

tomzo commented 5 years ago

This mostly involves a change to https://plugin-api.gocd.org/current/config-repo/#parse-directory and a format_version change

And adding new CR classes and merging part in config domain, which will handle same/similar material definitions, also among the ones defined in xml. (Have a look at how environments are handled, it is a similar case).

I wonder what are we loosing if the only way to define SCMs is as material inside the pipeline? Shouldn't we also add a global section in config-repos, dedicated to scms. E.g.

{
  "scms": [
    {
      "id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
      "type": "scm-plugin",
      "pluginID": "github.pr",
      "version" : "1",
      "configurations": {
        "url": "https://github.com/gocd/gocd.git"
      }
    }
  ]
}

Then pipeline can still refer to those scms, the same way it did so far:

{
  "scm_id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
  "destination": "destinationDir",
  "filter": {
    "ignore": [
      "dir1",
      "dir2"
    ]
  },
  "name": "myPluggableGit",
  "type": "plugin"
}
arvindsv commented 5 years ago

@tomzo I don't think we need to lose the other way. We will still need to keep the "ref" approach, for global SCMs which are defined at the GoCD config level. For users who don't want to define it in the YAML (maybe for security reasons?) and for backwards compatibility.

Shouldn't we also add a global section in config-repos, dedicated to scms. E.g.

To me, conceptually, the material belongs to the pipeline. The fact that "global" SCMs are at a different level in the XML is wrong as a concept. It is true that SCMs are common across pipelines and duplication is converted to references internally, even with the normal materials such as git by using fingerprints. However, from a user's perspective, that shouldn't matter. All they care about is: "When this material has a change, trigger this pipeline". They shouldn't care about the fact that it is common, and is defined somewhere else, etc.

That's why I thought that it should be brought into the "materials" section. Does that make sense?

If we keep the old approach of referencing an SCM by ID (by having the scm_id property and type = "plugin") and introduce this with say, type = "plugin.scm" would that work? Or, we could keep the same type and decide based on whether scm_id exists or not, to either define it here or define it outside.

tomzo commented 5 years ago

We will still need to keep the "ref" approach, for global SCMs which are defined at the GoCD config level.

Yeah, I missed that part.

If we keep the old approach of referencing an SCM by ID (by having the scm_id property and type = "plugin") and introduce this with say, type = "plugin.scm" would that work? Or, we could keep the same type and decide based on whether scm_id exists or not, to either define it here or define it outside.

At the CR contract level, I think it is more correct to have the first approach, because we are introducing a "new" type of material. Although at plugin/user level, it would be nice to have a more concise syntax which uses presence of scm_id as type descriptor. I mean from user perspective it not is very clear what type plugin vs plugin.scm would mean.

arvindsv commented 5 years ago

At the CR contract level, I think it is more correct to have the first approach, because we are introducing a "new" type of material. Although at plugin/user level, it would be nice to have a more concise syntax which uses presence of scm_id as type descriptor. I mean from user perspective it not is very clear what type plugin vs plugin.scm would mean.

@tomzo I agree. We should do that.

@ankitsri11 Let's define changes to the contract JSON as well as the user JSON (and user YAML) later today.

ibnc commented 5 years ago

Given that the syntax of the config can differ from the XMl, I'd think it'd be fine to introduce "global materials" which can be referenced/used by any pipeline. Something like this is already possible with the yaml plugin. This would be different in the sense that it'd formalize it a bit, and be in the JSON plugin.

materials:
  material1:
    type: git
    url: ...
  material2:
    type: plugin
    scm_id: x
   ...

pipelines:
  pip:
    materials
      ref: material2 # idk about the specific ref syntax

Just a thought.

While I think I understand the perspective of defining the scm in the pipeline materials, I'm thinking we should define them globally. The reasons that come to mind are the following:

@tomzo @arvindsv @ankitsri11

arvindsv commented 5 years ago

@ibnc If we're looking for reducing duplication, then templating is the way to go. Otherwise, we will have to make a decision about this every time a new entity comes up.

Just FYI, the next entities that might come up could be artifact repositories, elastic agent profiles, etc. We should find the right place for it and not force a user to shove them all into a "global" context and use refs within their own YAML / JSON files. Some of them might need to be global, but considering the pipeline as a concept being defined in the YAML / JSON, the right place for materials is the materials section.

It's closer to how things work on the server end. While not as good a reason as the other two, it will make the changes simpler. It might also be more intuitive to current users of pipeline as code

I know enough about the implementation of this to know that the changes will not be simpler. At the end of the day, the complexity here is in the merging logic. Whatever we do, that complexity won't go away. The YAML / JSON are just representations of the JSON message described in https://plugin-api.gocd.org/current/config-repo/#parse-directory [called "Contract JSON" earlier in this thread]. So, whatever approach we take, the SCM definitions are going to be in it. Where it is in that JSON, should not significantly impact the implementation.

arvindsv commented 5 years ago

Also, having two materials sections in the same YAML will confuse people, imo.

arvindsv commented 5 years ago

By the way, I don't think of the YAML and JSON files as different. It's just that the JSON plugin doesn't allow multiple pipelines to be defined in one file and the YAML one does. They should both be treated as the same, as if they're being used by having one pipeline in each .yaml or .json file. That way, we don't get influenced by some extra features provided by the YAML parser (references).

tomzo commented 5 years ago

Just a thought. While I think I understand the perspective of defining the scm in the pipeline materials, I'm thinking we should define them globally.

In the end I think, it's best to let users decide. If there are requests for global scope of any kind, then we should consider it.

The topic is close to templating. From my experiences with config-repos, having many references, even essential ones to other pipelines can become quite a problem when upstream config is changed.

In general, if user has a config-repo which has all of below:

Then config-repo would be unmaintable. The biggest problem now, is when a change happens in the referenced objects it can easily render the downstream pipelines invalid. Which forces user to update all of them in the same moment. Because of that, I am very skeptical of re-using configuration elements by making references to objects which can change. This also applies to templates. Which is why I once proposed to be able to lock references, like dependency managers do. But that's a different subject.

ibnc commented 5 years ago

@ibnc If we're looking for reducing duplication, then templating is the way to go. Otherwise, we will have to make a decision about this every time a new entity comes up.

True. Templating is definitely a solution to this problem. I'm just not 100% sure it's the right or only solution in this specific case. Pluggable scms seem more like a way to extend materials rather than being a material itself.

Just FYI, the next entities that might come up could be artifact repositories, elastic agent profiles, etc. We should find the right place for it and not force a user to shove them all into a "global" context and use refs within their own YAML / JSON files. Some of them might need to be global, but considering the pipeline as a concept being defined in the YAML / JSON, the right place for materials is the materials section.

It feels like we're moving away from this feature being only about configuring a pipeline(s) in a repo, and instead moving towards configuring the server in a repo. If I'm right that we're moving more towards configuring the entire gocd server within repositories, then we should look at it in terms of configuring the server and not just the pipeline(s). (granted, pipelines are a core concept of GoCD)

Yeah, the ref thing isn't very elegant...

I do think there's a way we can make things more generic and reusable though. The users of this feature are developers, and I think the concept of generalizing/abstracting things will make more sense and me more useful to developers. It'd make it more code or dsl like in a sense... idk, it's still a fuzzy idea in my head

arvindsv commented 5 years ago

Pluggable scms seem more like a way to extend materials rather than being a material itself.

No. Conceptually, they're exactly the same as materials. At least, they should be. They have the concept of fingerprints in them, etc. In fact, I would have preferred it if we moved to using the plugin SCM for git, svn, etc. The "inbuilt" materials. We considered it, but we felt that the user experience of having to go to a different location to edit it was not right. Users never saw the materials as being "outside" of the pipeline, even though GoCD sees it like that, to make sure that it doesn't poll the same repo defined across pipelines.

This also leads to the infamous "autoUpdate" value problem. Has to be same across pipelines.

arvindsv commented 5 years ago

It feels like we're moving away from this feature being only about configuring a pipeline(s) in a repo, and instead moving towards configuring the server in a repo. If I'm right that we're moving more towards configuring the entire gocd server within repositories, then we should look at it in terms of configuring the server and not just the pipeline(s). (granted, pipelines are a core concept of GoCD)

Maybe. But not yet. To me, the fact that the <scms> tag is "global" in the cruise-config.xml shouldn't be relevant. In fact, users shouldn't even have been able to see cruise-config.xml - since it is GoCD's domain model exposed to the world. To users, a material should be a cause for a trigger of a pipeline and since it is related to a pipeline, it belongs with it.

There's not a lot left in the config beyond pipelines, environments, agents and templates. In that sense, yes, we are moving away from the XML but not necessarily moving everything over to config-repos.

The server section, security section, etc. are not something that needs to be moved at this point. We do need to provide better controls over what entities a config-repo can change ("this repo can only affect pipelines in group A and environment B", etc), but authorization config, etc. don't need to be in a config repo.

ibnc commented 5 years ago

@arvindsv fair enough. Are we decided on the specific approach we want to take then?

arvindsv commented 5 years ago

@ibnc Since there are two parts to it - changing the "contract JSON" (between the plugin and the GoCD server) and merging that into the main GoCD config, can be started, I feel.

@ankitsri11 needs to write in a little more detail, the formats for the YAML and JSON plugin side of things.

I'm convinced about the approach.

arvindsv commented 5 years ago

@tomzo

The biggest problem now, is when a change happens in the referenced objects it can easily render the downstream pipelines invalid. Which forces user to update all of them in the same moment.

Hoping that some of the work being done currently by @marques-work and @kierarad around a preflight check + CLI should help with it, a bit. Of course, only up to the point of telling them earlier that it is going to break. But, if you have dependencies across repos, then yes, that continues to need an update at the same time, unless we do some locking, as you suggested.

ibnc commented 5 years ago

I think we should mirror the format for pluggable tasks: https://github.com/tomzo/gocd-json-config-plugin#plugin

@arvindsv @tomzo @ankitsri11

arvindsv commented 5 years ago

@ibnc Yes. But, there are a few formats to define.

  1. Between the plugin and the GoCD server.
  2. Between the JSON plugin and the user.
  3. Between the YAML plugin and the user.

(1) and (2) can be similar and could be what you said. For 3, we need to define whatever is simple for a user. Even for (2), it seems a little complex for users.

ankitsri11 commented 5 years ago

@arvindsv and I had a discussion on the syntax earlier today and we thought of below.

Contract level syntax:

At the contract level, we will be introducing type “plugin.scm” and keeping the old approach of referencing SCM by ID.

{
  "materials": [
    {
      "type": "plugin.scm",
     "plugin_configuration": {      
      "id": "github.pr",
      "version": "1"
      },
      "configurations": [ 
      {
        "key": "url",
        "value": "https://github.com/gocd/gocd.git"
      }
    ],
       "destination": "destinationDir",
       "filter": {
            "ignore": [
              "dir1",
              "dir2"
            ],
        "whitelist": []
  },
  {
  "scm_id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
  "name": "github.pr",
  "type": "plugin"
}
  ]
}

User level syntax (JSON):

At the user level syntax, the type “plugin” is the same and if the scm_id is missing, it will assume that the SCM is defined here as part of “plugin” type.

{

  "materials": [
   {
     "type": "plugin",
    "plugin_configuration": {
    "id": "github.pr",
    "version": "1"
  }, 
    "configurations": {
        "url": "https://github.com/gocd/gocd.git"
      },
     "destination": "destinationDir",
       "filter": {
            "ignore": [
              "dir1",
              "dir2"
            ],
        "whitelist": []
  }
}
  ]
},

{
  "scm_id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
  "name": "github.pr",
  "type": "plugin"
}

User level syntax (YAML):

 scm:
        type: plugin
        plugin_configuration:
           id: github.pr
           version: 1
        configuration:
            url: https://github.com/gocd/gocd.git
        destination: destinationDir
        blacklist:
            - dir1
            - dir2

  myPluggableGit:
    scm: someScmGitRepositoryId
ibnc commented 5 years ago

I don't think we need to add another type to the contract. I think we can just expand on the material type CRPluggableScmMaterial If it's specifying plugin_configuration and configuration then we can assume that it's declaring a new scm. I've already got something to this effect done on a branch.

The formatting for the Json user level is confusing. Is it supposed to be something like this?

{
  "materials": [
  {
    "type": "plugin",
    "name": "some_name" //name is required
      "plugin_configuration": {
        "id": "github.pr",
        "version": "1"
      }, 
      "configurations": {
        "url": "https://github.com/gocd/gocd.git"
      },
      "destination": "destinationDir",
      "filter": {
        "ignore": [
          "dir1",
        "dir2"
        ],
        "whitelist": []
      }
  },
  {
      "scm_id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
      "name": "github.pr",
      "type": "plugin"
  }]
}

@ankitsri11

arvindsv commented 5 years ago

@ibnc The new type to the contract was probably a reaction to what @tomzo had mentioned earlier: https://github.com/gocd/gocd/issues/5694#issuecomment-453511015.

Formatted, the JSON you pasted looks like this:

{
  "materials": [
    {
      "type": "plugin",
      "name": "some_name",
      "plugin_configuration": {
        "id": "github.pr",
        "version": "1"
      },
      "configurations": {
        "url": "https://github.com/gocd/gocd.git"
      },
      "destination": "destinationDir",
      "filter": {
        "ignore": [
          "dir1",
          "dir2"
        ],
        "whitelist": []
      }
    },
    {
      "scm_id": "4937e2fe-be38-46ae-a8cc-65e38c68c15e",
      "name": "github.pr",
      "type": "plugin"
    }
  ]
}

Looks about right to me.

arvindsv commented 5 years ago

By the way, the decision about the contract is not really relevant to the user-side of things. So, I think @tomzo and @ibnc should decide.

tomzo commented 5 years ago

I don't think we need to add another type to the contract. I think we can just expand on the material type CRPluggableScmMaterial If it's specifying plugin_configuration and configuration then we can assume that it's declaring a new scm. I've already got something to this effect done on a branch.

Since you already have it, then I think that's OK then. My point earlier was that it might be easier to have a new type for it.

ibnc commented 5 years ago

Are we finished with this piece of work? Or is there anything else we want to do before closing out this issue?

/cc @arvindsv @ankitsri11

arvindsv commented 5 years ago

I can't think of anything else.

ibnc commented 5 years ago

Okay, then I'm gonna go ahead and close this issue. @ankitsri11 if you think of anything feel free to reopen this issue

/cc @arvindsv

arvindsv commented 5 years ago

@tomzo @ibnc What did we decide about format_version update, for this? It looks like the format_version is not bumped up. I suppose an old YAML plugin with a new GoCD server will not work?

I realize the plugins are bundled. But, wondering if we missed it or decided not to do it.

tomzo commented 5 years ago

@arvindsv Based on this I thought we decided not to bump it.

arvindsv commented 5 years ago

@tomzo Great! Thank you.