Get Pipeline label of newly triggered Pipeline

madhurranjan commented 9 years ago

Hi,

Lets say I've instantiated a pipeline via the POST call. The response should tell me the pipeline label that has been kickstarted and I should be able to query on that pipeline label. I remember something similar was present earlier . Can you tell me if this requirement can be met with the current list of apis ?

Thanks

matt-richardson commented 9 years ago

:+1: It would be great if the schedule api returned something to reference the pipeline run.

At the moment, we are passing in a guid as a param to the scheduled build, and then polling until we find a build with that guid in it. Very hacky, but it actually works very well.

arvindsv commented 9 years ago

I remember something similar was present earlier

I don't think it was ever there. I agree it would be nice. But, why I think it was never there, is because the schedule pipeline API calls the material update subsystem, which goes away and tries to poll materials. Once that finishes, the pipeline gets scheduled.

The API call does not wait for all this, and so, it cannot know the scheduled pipeline details. At this point, I'd like to drag in @sriki77 (kicking and screaming) into this conversation and ask him to implement this, since he knows how. :) It's way past the time he should contribute.

drmuey commented 9 years ago

@matt-richardson

we are passing in a guid as a param to the scheduled build, and then polling until we find a build with that guid in it.

Hello Matt,

I have the same problem as everyone with #990 and I am trying to work around it doing the same thing your are via the API. I would love to hear how you are accomplishing that!

Here is what I have found so far:

If I set an environment variable:

I can pass it to the schedule API call via variables[VAR_NAME_HERE]=WhateverValueIWant
It is not included in the history API data :(

If I set a parameter:

Does not appear that we can send it to the API via HTTP params like we can with environment variables. At least I can't find that in the docs.
It is not included in the history API data :(

So if you can elaborate on:

How do you tie your unique value to the scheduled build?
What API you are using for polling that contains that unique value in tthe results?

That’d be extremely helpful! thanks ;)

matt-richardson commented 9 years ago

Hi @drmuey

We trigger the schedule API, passing the variable as you have suggested about.

Then we poll the piplines stages API, then for each stage, get the job detail, and then we get the console.log artifact for that job. Then we check that for the guid that we passed in as the variable.

Long, painful, but it works.

It would be super nice if the schedule api gave back an id so that we could poll on that, but I suspect that would involve some architectural changes..

Hope that helps!

drmuey commented 9 years ago

@matt-richardson thanks it does, I think the polling I'll end up with will be looking through pipline history for materials w/ the SHA we know about, once we have the count number for that we have an exact build to act on from that point on.

arvindsv commented 8 years ago

This needs someone to work on it. Given the way it is implemented now, this can't be added easily, since it is asynchronous. I think the way to do this is to return a guid with a 201/202 response, and then provide an API to check the status of that guid. When the pipeline is triggered (or fails to trigger), the status will give information about the build number, etc. and can be used to do further operations. As I said, needs someone to try and contribute this change. It's not planned at this point.

drmuey commented 8 years ago

@arvindsv What would it take to include the environment variables that were sent to the scheduler in the history data? If you can point me in the right direction, per your offer in #1417, I'll see about a pull request to do it. Should this be a different issue?

That would be very simple and very robust because it allows:

us to find the build number for any arbitrary scheduling the moment it is assigned a number
- without needing special stages to tag the build (see forum thread)
- without needing to wait for the stage to run to get the build number (see forum thread)
go to remain asyncronous in its scheduling and no API updates (sans the one new field in history data)

e.g.

Start build (POST /go/api/pipelines/:pipeline_name/schedule w/ variables[MyUniqTag]="Derp101")
Fetch history (GET /go/api/pipelines/:pipeline_name/history[/N])
Loop through builds per this pseudo code:

    if( build.scheduled_with{'variables[MyUniqTag]'} == tag_i_am_looking_for ) { # tag_i_am_looking_for is 'Derp101' in our example
        return build.counter; # this stops the looping
    }

arvindsv commented 8 years ago

@drmuey: Sorry for the delay. I'll respond tomorrow.

arvindsv commented 8 years ago

@drmuey: I took a look at making that change. I don't think that is the "right way"™ :) To me, that seems like changing the history API to provide environment variables, as a bit of a hack. If we decide to put environment variables there, there are many other pieces of information which might be relevant too.

Having said that, if we decide to go ahead with something like that, ideally it should have been be as simple as adding a line like this here:

@variables = Hash[buildCause.getVariables().map {|env_var| [env_var.getName(), env_var.getDisplayValue()]}]

However, since the history call can be potentially non-performant, it has been carefully created to load only what is required to create that JSON. In this case, environment variables didn't make the cut. So, they're not loaded from DB at all.

If you trace calls starting from here, through here, through here, through here and finally here, you'll see that, though material revisions (commits/pipelines which led to this build) are loaded on to the build cause, the environment variables are not. An extra call will need to be made, to load environment variables. Probably this method. Once that is done, the earlier change I mentioned should work (the one in build_cause_api_model.rb).

I think a better way (since we're talking about fixing it) is how asynchronous APIs are usually handled:

Return a token (actually a unique url for this trigger, not just a value) to the user from around here, as a part of the 202 response.
Also, store the token somewhere (a token service, say), so that the user can query the status of the build for that token.
Once the pipeline is actually triggered here, then mark the token as "finished" and provide the token service the information about the counter, so that it can then provide it to the user.
Implement the end-point which queries for the token (so that the user can poll for it).

This is not extremely hard, by the way.

drmuey commented 8 years ago

@arvindsv that would be awesome! (and glad to hear its not difficult)

I'd be happy to help though my lack of familiarity w/ the architecture would probably hinder you, perhaps I can donate for the effort in some way though? Let me know the best way I can help you, perhaps some free-as-in-:beer: donations ;)

drmuey commented 8 years ago

… as a bit of a hack. If we decide to put environment variables there, there are many other pieces of information which might be relevant too. … However, since the history call can be potentially non-performant, it has been carefully created to load only what is required to create that JSON. In this case, environment variables didn't make the cut. So, they're not loaded from DB at all.

If the ideal version is too much at this time what would you think of this:

For performance and non-hackiness and simplicity of change: What if we just included one variable, say, MY_EXTERNAL_ID that could be passed in at schedule time (variables[MY_EXTERNAL_ID]=whatever) for anyone wanting to be able to find the build they started in history.

grahamc commented 8 years ago

My $0.01 is that finding a build by a variable being passed in is pretty round about and a more correct version would be to update go to return a url based on a uuid or some other identifier. As a user, this is what I would want and expect from the API, anyway.

Graham

Graham

On Mon, Sep 7, 2015 at 10:21 AM, drmuey notifications@github.com wrote:

… as a bit of a hack. If we decide to put environment variables there, there are many other pieces of information which might be relevant too. … However, since the history call can be potentially non-performant, it has been carefully created to load only what is required to create that JSON. In this case, environment variables didn't make the cut. So, they're not loaded from DB at all. If the ideal version is too much at this time what would you think of this:

For performance and non-hackiness and simplicity of change: What if we just included one variable, say, MY_EXTERNAL_ID that could be passed in at schedule time (variables[MY_EXTERNAL_ID]=whatever) for anyone wanting to be able to find the build they started in history.

Reply to this email directly or view it on GitHub: https://github.com/gocd/gocd/issues/990#issuecomment-138310866

drmuey commented 8 years ago

@grahamc agreed, keep in mind though that that is what we have to do now anyway except it requires a custom stage/job and downloading and parsing of the log file (which can take days to start even its a simple echo job) to find our variable “tag”.

I suggested the single variable compromise because I suspect a simple targeted one line change is more likley to get done than an overhaul of the scheduling API.

Either way is way better than the current state with is über “round about” ;). If you have any ideas we've overlooked feel free to chime in!

drmuey commented 8 years ago

Another tack would be to add the thing we want to look for to the label but ATM pipeline labels:

can not have environment vars (try it, you’ll get “You have defined a label template in pipeline whatever that refers to a material called env.WHATEVER, but no material with this name is defined.”)
can have params but params can't be sent in the API call (or am I reading the doc wrong?)

Perhaps a feature allowing use of environment vars in pipeline labels would be a simpler way to solve this problem?

arvindsv commented 8 years ago

I'd be happy to help though my lack of familiarity w/ the architecture would probably hinder you, perhaps I can donate for the effort in some way though? Let me know the best way I can help you, perhaps some free-as-in-:beer: donations ;)

@drmuey: :beer: is easy to come by. :watch: (umm, time. Not a watch) is not, unfortunately. :) If you're willing to learn the architecture and take a stab at developing this, I'll try and find someone from the team to help you initially and when you get stuck. If no one is around, I can usually help. But, me trying to do this myself (especially given lots of :beers:) is not going to work. I am already caught up in too many different things.

drmuey commented 8 years ago

@arvindsv lol, I totally get that ;) I'm in the same boat but let me see what I can sort out

drmuey commented 8 years ago

Another approach would be to support tags in the schedule API: POST …/schedule w/ materials[my_git_material_nam]=v1.2.3.4.

That should mean that we can find the build number by revision (which is reliable and comparitvely simple). It would also have the boon that we wouldn't have to have a job that set up the repo before doing anything else.

Would that be a thing go.cd would accept via pull request?

I saw some discussion on it but no issue, should I create one or use this one?

drmuey commented 8 years ago

Update to “Another tack would be to add the thing we want to look for to the label but ATM pipeline labels”:

Decide on a special environment var like EXTERNAL_ID or, perhaps, GO_EXTERNAL_ID to ensure it does not clobber anyone’s current use of EXTERNAL_ID (YAGNI?).

Add names.add("env.EXTERNAL_ID"); to gocd/config/config-api/src/com/thoughtworks/go/config/PipelineConfig.java before or after line 477’s names.add("COUNT");

Then a pipeline would:

Add EXTERNAL_ID to its Env Var list (not needed if its a GO_ one but GO_one can't be passed in to schedule API right? nice to avoid complication)
Set its label like 1.2.${COUNT} (${env.EXTERNAL_ID})

Which would mean:

we could find the build we were looking for by parsing the labels in history (a huge simplification of what we have to do now).
take almost no overhead
change is simple enough that docs and tests would be easier ot update
If someone is not interested in using it then they simply don't use it.

Would this simpler, less invasive, lightweight change be an acceptable alternative approach?

For us itd still require the use of jobs to checkout the right tag first but if “Another approach would be to support tags in the schedule API” is too complicated to do very soon this would make a nice simple alternative.

arvindsv commented 8 years ago

I think doing something special just for this (including using tags and external_id) feels hacky. Thoughts about the two approaches:

Approach 1 - Using tags: Tags are not considered by Go at all, currently. Doing this means figuring out tags in all the different types of materials (perforce, tfs and plugins ...) and saving tags in the database (find out where an appropriate place is, create columns, migrations). Also, existing commits in the DB won't have tags. If not saving it in the DB, then there would have to be code to specifically find a commit by a tag and use it. Go doesn't just trigger builds for a random tag/commit ID. The commit should be seen by the server, and only then will it trigger. Otherwise, information about the commit won't be in the DB, and the build will not be traceable or repeatable.
Approach 2 - Using external_id: Environment variables are not supported in labels at this time. The code you referred to is at the config level, and is used to validate a label in the config. It has no access to environment variables at that time (as you can imagine, at config time, environment variables or materials are not yet decided). What makes this approach feel weird to me is that it is making a change to handle a very specific case. I'd rather handle all environment variables in a label (the issue I referenced earlier) than a specific environment variable.

I'm very happy you're looking at code. :) @juhijariwala has offered to help you, give you some context and show you around the code. I wouldn't mind if you picked a more general version of your approach 2 (handling all environment variables) or the approach I mentioned earlier (return a token-URL to check). I prefer the token-URL approach, since it is direct and solves the problem, whereas the environment variables in a pipeline approach is indirect, and allows you to handle your problem through a different channel (parsing an unrelated API response). But, as I said, either one works for me.

grahamc commented 8 years ago

A few more thoughts on this:

The original request is for a mechanism that returns the label of the pipeline, so the user can query based on that label to get the newly created pipeline. Unless I'm interpreting the thread incorrectly, what is actually useful in the end is to know the URL of the pipeline.

I think @arvindsv's original suggestion makes the most sense:

I think the way to do this is to return a guid with a 201/202 response,
and then provide an API to check the status of that guid. When the
pipeline is triggered (or fails to trigger), the status will give
information about the build number, etc. and can be used to do further
operations.

The rest of the techniques here are just patches over the inability to tie a triggering to a pipeline:

Returning the pipeline label still involves the original issue of polling the material update system, which means we would still need to implement the 201 / 202 with a guid reply.
Passing in GUID environment variables offers, as far as I can tell, almost no additional value outside of the "now-I-can-find-it-again" feature, and requires extensive amounts of looping and HTTP requests in order to find the pipeline. Same with parameters.
Returning the environment data with the history API requests isn't "extremely hard," but also doesn't appear to be a feature we really want to add, just in order to support the ability to ... fetch the URL of the just-triggered pipeline. I would definitely prefer the history API remain performant for debugging agent issues, than patch on environment variables.
Environment variables in pipelines is definitely a feature people want, and the team seems interested in implementing. However, as (again) an end-user of the product, it is still a patch over the original intent.
Build-by-tag is potentially interesting, but teaching GoCD about tags is evidently not a simple task. Also, tags can change with force pushes. This may get complicated.
Instead of having a "magic" environment variable (EXTERNAL_ID), I would rather see full support for environment variables in labels. That said, I'm still insisting that the desired feature here is actually to return the URL of the build.

I think a lot of great feature suggestions have come out of this thread, but I worry about implementing many of these just as an means to an end. I would rather see each of these features being wanted out-right before implementing them.

Here is some code I would like to use:

import requests
import time

scheduling = requests.post('server/go/api/my_pipeline/schedule')
check_url = scheduling.headers['Location']

pipeline_url = None
while pipeline_url is False:
    status = requests.get(check_url)

    if status.status_code == 200:
        pipeline_url = status.headers['Location']
    else:
        time.sleep(status.headers['Retry-after'])

my_pipeline_data = requests.get(pipeline_url)

drmuey commented 8 years ago

@grahamc you're right its not necessarily the label we want per se but rather the id (AKA the COUNT). With the id we can determine any number of URLs and what not. Of course if we had a Location header (or the label etc) to parse then we could determine the counter for that build. I don't really care about the mechanism to find the id (unless its incredibly delayed or fragile like the console.log parsing approach I tried), I just need the end result for the go.cd API to be of any use.

The other solutions I proffered were just ideas that were smaller in scope in the hopes that it'd be easier to see–to–fruition than the The Right Way which will likley take much longer.

grahamc commented 8 years ago

Right. When the URL to the pipeline is ultimately returned, other URLs can be easily returned, if they aren't already in the newer API definitions.

kitplummer commented 8 years ago

Running into this need now too. I really just need something back from the schedule POST that gives me something to poll. Messing with VARs, or scanning logs isn't really a solution.

@grahamc is spot on. Is there any update on this front?

Pleading ignorance...but I'm wondering why the schedule POST can't return the next-in-line build number(pipelines/:pipeline/instance/#) with the 202. If the schedule was accepted then it should be available right? Why not wait for the 'materials loop' to finish to return the 202?

raguayo-springcm commented 7 years ago

@grahamc has the right idea. I'm very interested in seeing this feature implemented. Any updates on this? It has been almost a year.

nudgegoonies commented 5 years ago

This is completely Horror. The schedule API must return something useful to access the pipeline result.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had activity in the last 90 days. If you can still reproduce this error on the master branch using local development environment or on the latest GoCD Release, please reply with all of the information you have about it in order to keep the issue open. Thank you for all your contributions.

gocd / gocd

Get Pipeline label of newly triggered Pipeline #990

For performance and non-hackiness and simplicity of change: What if we just included one variable, say, MY_EXTERNAL_ID that could be passed in at schedule time (`variables[MY_EXTERNAL_ID]=whatever`) for anyone wanting to be able to find the build they started in history.

gocd / gocd

Get Pipeline label of newly triggered Pipeline #990

For performance and non-hackiness and simplicity of change: What if we just included one variable, say, MY_EXTERNAL_ID that could be passed in at schedule time (variables[MY_EXTERNAL_ID]=whatever) for anyone wanting to be able to find the build they started in history.

For performance and non-hackiness and simplicity of change: What if we just included one variable, say, MY_EXTERNAL_ID that could be passed in at schedule time (`variables[MY_EXTERNAL_ID]=whatever`) for anyone wanting to be able to find the build they started in history.