Closed GurliGebis closed 1 month ago
Seems like this happens when the jobs are being seeded for the first time.
Triggering the seed-jobs.sh build
script seems to get them to build.
Is this to be expected?
Right the first line gives you hint what is going on Branch indexing
.
The build you seeing isn't the actual build. Multibranch Pipeline work by discovering branches from repository, when the job is first defined Jenkins has no idea what branches repository has thus it triggers build that does only Branch indexing and automatically skips all steps. This is required since without branch information the Jenkins doesn't even know what to build.
That's why it's not possible to build jobs right away when you define them since there isn't anything to build yet. After first branch indexing is completed then you can trigger actual build of what Jenkins discovered.
I don't like how Jenkins presents branch indexing as build since that's just confusing but that's what it does. When you looking at the build in the Jenkins it says it's branch indexing so you need to check what you actually looking at.
Indeed you are right ๐ Maybe I shouldn't mess around with those things when I'm tired ๐
Thanks.
Btw. do you know if env vars and stuff can be configured using the REST API? That way, we can automated even more of the initial setup.
The documentation for the Jenkins REST API is kind of hard to come by. I started just by searching examples of other people who did make it work, but that doesn't necessarily get you what you want specifically - it gives you idea how to do it generally. There is generic information on how API works - https://www.jenkins.io/doc/book/using/remote-access-api/.
But there is no public reference, where you would see all the endpoints or ways you can interact with them. Instead what you should use is Jenkins itself. If you go to your Jenkins and add /api/ on the end of URL where you are, then you get some information, this works only somewhere - generally it works on pages that represent some entity - like job, node, ...
For example you would be interested in the built-in node, so you navigate in Jenkins to built-in node API page http://JENKINS_HOST:8080/manage/computer/(built-in)/api/xml.
I didn't find a example or way how to change the system configuration though. Maybe it's somewhere but I didn't find it.
Perhaps for these kind of tasks the REST API isn't the right tool...
I did find the system configuration in internal files in /var/lib/jenkins
:
/var/lib/jenkins/config.xml (general configuration)
/var/lib/jenkins/credentials.xml (ssh key)
/var/lib/jenkins/org.jenkinsci.plugins.workflow.libs.GlobalLibraries.xml (pipeline library)
and there are others...
Thus another perhaps easier way would be to shutdown Jenkins and run command similar to this one to modify those .xml as needed. Then you can start Jenkins already configured. I wouldn't just replace those files with some examples, that seems like bad idea - it could break things (due to different environment/version), thus I would suggest making copy, trying to modify the copy and if that works then to put the modified copy back in /var/lib/jenkins
. It would be nice to leave the original somewhere for example as /var/lib/jenkins/config.xml.orig
if the process goes wrong.
I did try to manually modify /var/lib/jenkins/config.xml
by adding another Environment variable the way as describe above and it works, what I added Jenkins treats as its own. The format is bit funny - like for example:
<tree-map>
<default>
<comparator class="java.lang.String$CaseInsensitiveComparator" reference="../../../../../../views/listView/jobNames/comparator"/>
</default>
<int>3</int>
<string>ARM64_BUILD_DISABLED</string>
<string>true</string>
<string>CUSTOM_BUILD_CHECK_DISABLED</string>
<string>true</string>
<string>DEV_PACKAGES_VYOS_NET_HOST</string>
<string>jenkins@172.17.17.17</string>
</tree-map>
There is no structure for variables instead there is counter and series of <string>
elements that correspond to twice the number of the counter... Whatever, weird but it works, you add another variable by adding two <string>
elements and incrementing the counter by 1. Most of the settings are represented as proper structure though.
Not sure I am a fan of that, since the risk of breaking it is too big.
I got my setup running with all packages building ๐ Now I just need to script the last part, and then test it all on a clean vm
Jenkins has also CLI client - https://www.jenkins.io/doc/book/managing/cli/ - for the remote control.
You get get it from your Jenkins:
wget http://172.17.17.17:8080/jnlpJars/jenkins-cli.jar
Then use it with the usual user/token as you would the REST API:
java -jar jenkins-cli.jar -http -s http://172.17.17.17:8080 -auth "$JENKINS_USER:$JENKINS_TOKEN" help
This is what I get for available actions:
add-job-to-view
Adds jobs to view.
build
Builds a job, and optionally waits until its completion.
cancel-quiet-down
Cancel the effect of the "quiet-down" command.
clear-queue
Clears the build queue.
connect-node
Reconnect to a node(s)
console
Retrieves console output of a build.
copy-job
Copies a job.
create-credentials-by-xml
Create Credential by XML
create-credentials-domain-by-xml
Create Credentials Domain by XML
create-job
Creates a new job by reading stdin as a configuration XML file.
create-node
Creates a new node by reading stdin as a XML configuration.
create-view
Creates a new view by reading stdin as a XML configuration.
declarative-linter
Validate a Jenkinsfile containing a Declarative Pipeline
delete-builds
Deletes build record(s).
delete-credentials
Delete a Credential
delete-credentials-domain
Delete a Credentials Domain
delete-job
Deletes job(s).
delete-node
Deletes node(s)
delete-view
Deletes view(s).
disable-job
Disables a job.
disable-plugin
Disable one or more installed plugins.
disconnect-node
Disconnects from a node.
enable-job
Enables a job.
enable-plugin
Enables one or more installed plugins transitively.
get-credentials-as-xml
Get a Credentials as XML (secrets redacted)
get-credentials-domain-as-xml
Get a Credentials Domain as XML
get-gradle
List available gradle installations
get-job
Dumps the job definition XML to stdout.
get-node
Dumps the node definition XML to stdout.
get-view
Dumps the view definition XML to stdout.
groovy
Executes the specified Groovy script.
groovysh
Runs an interactive groovy shell.
help
Lists all the available commands or a detailed description of single command.
import-credentials-as-xml
Import credentials as XML. The output of "list-credentials-as-xml" can be used as input here as is, the only needed change is to set the actual Secrets which are redacted in the output.
install-plugin
Installs a plugin either from a file, an URL, or from update center.
keep-build
Mark the build to keep the build forever.
list-changes
Dumps the changelog for the specified build(s).
list-credentials
Lists the Credentials in a specific Store
list-credentials-as-xml
Export credentials as XML. The output of this command can be used as input for "import-credentials-as-xml" as is, the only needed change is to set the actual Secrets which are redacted in the output.
list-credentials-context-resolvers
List Credentials Context Resolvers
list-credentials-providers
List Credentials Providers
list-jobs
Lists all jobs in a specific view or item group.
list-plugins
Outputs a list of installed plugins.
mail
Reads stdin and sends that out as an e-mail.
offline-node
Stop using a node for performing builds temporarily, until the next "online-node" command.
online-node
Resume using a node for performing builds, to cancel out the earlier "offline-node" command.
quiet-down
Quiet down Jenkins, in preparation for a restart. Donโt start any builds.
reload-configuration
Discard all the loaded data in memory and reload everything from file system. Useful when you modified config files directly on disk.
reload-job
Reload job(s)
remove-job-from-view
Removes jobs from view.
replay-pipeline
Replay a Pipeline build with edited script taken from standard input
restart
Restart Jenkins.
restart-from-stage
Restart a completed Declarative Pipeline build from a given stage.
safe-restart
Safe Restart Jenkins. Donโt start any builds.
safe-shutdown
Puts Jenkins into the quiet mode, wait for existing builds to be completed, and then shut down Jenkins.
session-id
Outputs the session ID, which changes every time Jenkins restarts.
set-build-description
Sets the description of a build.
set-build-display-name
Sets the displayName of a build.
shutdown
Immediately shuts down Jenkins server.
stop-builds
Stop all running builds for job(s)
update-credentials-by-xml
Update Credentials by XML
update-credentials-domain-by-xml
Update Credentials Domain by XML
update-job
Updates the job definition XML from stdin. The opposite of the get-job command.
update-node
Updates the node definition XML from stdin. The opposite of the get-node command.
update-view
Updates the view definition XML from stdin. The opposite of the get-view command.
version
Outputs the current version.
wait-node-offline
Wait for a node to become offline.
wait-node-online
Wait for a node to become online.
who-am-i
Reports your credential and permissions.
install-plugin is interesting, there is example how to use it https://gist.github.com/basmussen/8182784
I don't see anything in the CLI commands that would be related to system configuration, environment variables, pipeline plugins, ...
reload-configuration
Discard all the loaded data in memory and reload everything from file system. Useful when you modified config files directly on disk.
What? Modifying those internal files is the official way? ๐ฎ
If you compare what is in
/var/lib/jenkins/jobs/dropbear/config.xml
and what REST API or CLI returns:
java -jar jenkins-cli.jar -http -s http://172.17.17.17:8080 -auth "$JENKINS_USER:$JENKINS_TOKEN" get-job dropbear
It's the same config.xml! ๐
Jenkins has 3 different methods of remote control: CLI with JAR CLI over SSH (Jenkins creates it's own internal SSH server) REST API
CLI over JAR is supposed to be the easiest way.
Yet it seems like all of those 3 options are the same thing. The API/CLI gives you nicer interface for some actions (like build) but if you want to create job for example then you need to upload the XML in the format what is on the disk - so you just pipe config.xml it into create-job/update-job:
java -jar jenkins-cli.jar create-job NAME
Creates a new job by reading stdin as a configuration XML file.
NAME : Name of the job to create
That's what the REST API does as well. We can't escape the XML what is on the disk it seems...
I can try and install a clean Jenkins, save the config, make the changes and then diff the config files - maybe the changes can be done in a safe way.
It's worth a try at least ๐ If nothing else - having a well written guide for making the changes is also okay.
You don't necessarily need to create diff. I would suggest to read the XML of already configured Jenkins and you will understand where is what and what is what by the values you see. You iterate until you find all the values you want to change as in the manual guide. Then you have collection of pieces of XML you want to insert or modify in specific XML files.
If you do manipulation of the XML by proper tools, like xmlstarlet
, then you reduce the chance of malformed result. There is always chance that future Jenkins will differ so much that modification would result in broken configuration. That's where the manipulation script should make backup of every file it modified so there is easy way just to call some kind of revert command and that would restore the original files as in state before before modification - now the user can follow manual guide as workaround.
Plugins require installation - it can be done via CLI JAR or REST API.
I think this way it should be possible to automate the Jenkins configuration in relatively safe manner with fallback to manual way if something goes wrong.
Okay ๐I'll see what I can do, when I get time to look at it again this weekend (hopefully)
A small update - so far I have everything automated - only missing the import of the projects and triggering them. And then a complete test on a clean Debian machine.
Looks good, you done a lot of work!
The seed of jobs and first build requires to wait before Jenkins finishes branch discovery - do you plan to automate this as well? There would need to be some spinning loop and check if Jenkins has still unfinished runs? I know that if you trigger build immediately then Jenkins runs fail because jobs don't know branch yet. It would be nice to just queue the jobs instead of waiting and then triggering - that doesn't seem to be possible though? Since if you have enough executors then they will try to build before another executor finished discovery I think.
My original script did have build right after job create and that didn't work. Maybe just some static sleep would work? Like 10 seconds to let the branch runs initialize and then maybe Jenkins would queue the actual builds? I didn't try that.
I added another job that needs additional Jenkins configuration. Additional jobs/packages are automatically handled by the seed-jobs.sh
/jobs.json
script but this one requires additional Jenkins environment variable for configuration. See Configure environment variables and add global vyos-build Jenkins library for CUSTOM_DOCKER_REPO
.
I plan on that yes. Also, I need to find a way to not schedule everything at once, since they timeout, if everything is queued at once for the first build.
What kind of timeouts you see? Like the runs expire before they run? There is also option the run expires in middle of it's steps. I know there is 240 minutes specified for job to complete build - is this what causes the issue or another timeout?
I didn't see such issue. I'm using 8 executors and fairly performant system though because and didn't want to wait extra for testing purposes - that's perhaps why.
It would be good to handle this on the Jenkins side - so it just waits like day? There is also possibility to send 10 jobs, wait for free queue, then send another 10 but we don't know if 10 is too much - there is also option to send just one by one but that is very slow. The script would need to wait in this case and that's not good for users - it would be better if the script did exit and Jenkins did it in background. Running the script itself in background is not great since then there isn't clear feedback.
They expire before the 120 minutes (I think it is), due to there being only a few build agents, and building everything at once makes the jobs wait for too long (since they all complete one step, then go back in the queue for the next step).
Anyway, I just tried building the container images - it fails when it tries to branch index them:
Branch indexing
> git rev-parse --resolve-git-dir /var/lib/jenkins/caches/git-014f40594d2ff9ee882fdf7dc8a7fdfe/.git # timeout=10
Setting origin to https://github.com/dd010101/vyos-missing.git
> git config remote.origin.url https://github.com/dd010101/vyos-missing.git # timeout=10
Fetching origin...
Fetching upstream changes from origin
> git --version # timeout=10
> git --version # 'git version 2.39.2'
> git config --get remote.origin.url # timeout=10
> git fetch --tags --force --progress -- origin +refs/heads/*:refs/remotes/origin/* # timeout=10
Seen branch in repository origin/current
Seen branch in repository origin/equuleus
Seen branch in repository origin/sagitta
Seen 3 remote branches
Obtained packages/vyos-build-container/Jenkinsfile from 71d739ce597a77e8d2a4be3022f08f36bb7a4ab4
ERROR: Could not find any definition of libraries [vyos-build@sagitta]
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
WorkflowScript: Loading libraries failed
1 error
at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:309)
at org.codehaus.groovy.control.CompilationUnit.applyToPrimaryClassNodes(CompilationUnit.java:1107)
at org.codehaus.groovy.control.CompilationUnit.doPhaseOperation(CompilationUnit.java:624)
at org.codehaus.groovy.control.CompilationUnit.processPhaseOperations(CompilationUnit.java:602)
at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:579)
at groovy.lang.GroovyClassLoader.doParseClass(GroovyClassLoader.java:323)
at groovy.lang.GroovyClassLoader.parseClass(GroovyClassLoader.java:293)
at org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.GroovySandbox$Scope.parse(GroovySandbox.java:163)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.doParse(CpsGroovyShell.java:190)
at org.jenkinsci.plugins.workflow.cps.CpsGroovyShell.reparse(CpsGroovyShell.java:175)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.parseScript(CpsFlowExecution.java:635)
at org.jenkinsci.plugins.workflow.cps.CpsFlowExecution.start(CpsFlowExecution.java:581)
at org.jenkinsci.plugins.workflow.job.WorkflowRun.run(WorkflowRun.java:335)
at hudson.model.ResourceController.execute(ResourceController.java:101)
at hudson.model.Executor.run(Executor.java:442)
Finished: FAILURE
(I did add the needed env var, but that doesn't seem to have anything to do with it).
My guess would be, that it clones the vyos-missing
repo, but the vyos-build
library is defined in the vyos-build
repo.
Do you have vyos-build Jenkins pipeline library configured in System settings? The env is used later - your run fails before anything runs. Do other sagitta jobs run fine? The library is used by all sagitta jobs.
Started by user [root](http://:8080/user/root)
> git rev-parse --resolve-git-dir /var/lib/jenkins/caches/git-014f40594d2ff9ee882fdf7dc8a7fdfe/.git # timeout=10
Setting origin to https://github.com/dd010101/vyos-missing.git
> git config remote.origin.url https://github.com/dd010101/vyos-missing.git # timeout=10
Fetching origin...
Fetching upstream changes from origin
> git --version # timeout=10
> git --version # 'git version 2.39.2'
> git config --get remote.origin.url # timeout=10
> git fetch --tags --force --progress -- origin +refs/heads/*:refs/remotes/origin/* # timeout=10
Seen branch in repository origin/current
Seen branch in repository origin/equuleus
Seen branch in repository origin/sagitta
Seen 3 remote branches
Obtained packages/vyos-build-container/Jenkinsfile from 7e01b3ff48cc8fc69dc1eb0522afedecadf13d70
Loading library vyos-build@sagitta
Attempting to resolve sagitta from remote references...
> git --version # timeout=10
> git --version # 'git version 2.39.2'
> git ls-remote -h -- https://github.com/dd010101/vyos-build.git # timeout=10
Found match: refs/heads/sagitta revision f52e36e619f32e5c7cc536b9398687c8061cb269
Here is snippet from successful run.
I will take a look once I get back to me machine later tonight. But most likely not ๐ I plan on setting up that job first, and once it has build the images, it will provision the other jobs.
It's described in Configure environment variables and add global vyos-build Jenkins library - Global Pipeline Libraries -> Add. I would suggest not to bother with this job from the start - running the first build in bash is easier and gives more feedback. Then the job can do maintenance.
They expire before the 120 minutes (I think it is), due to there being only a few build agents, and building everything at once makes the jobs wait for too long (since they all complete one step, then go back in the queue for the next step).
Here is the 240 minute timeout and I don't see any other. It would be great if you posted the build log of job that does fail because of timeout to see if there are clues to what timeout it is. It's annoying how jobs do first step and then pause before all jobs do first step but that's how the scripts are written.
Having Jenkins build the container images is actually a simplification of the setup ๐ Waiting for that build is easy, just sleep a while and check if the images are there yet. I will try with the clean setup once I get that far, and if it fails, I will post some logs.
I will be back later with more info (or tomorrow, depending on how long time it takes)
Waiting for that build is easy, just sleep a while and check if the images are there yet.
Like docker images | grep
? ๐ค Using the existing bash script would be easier - but you like to do it better! ๐
BTW: curl -Ss -g --fail-with-body "$JENKINS_URL/computer/api/json" | jq .computer[0].idle
gives "true" when all is done and "false" when something is running - could come handy if you didn't look at it yet.
nice, I'll keep that one in mind.
Btw. you might want to call docker buildx prune -f
when you remove the old images.
Otherwise it finishes instantly, using the buildx cache (so no new image is ever built).
I got it working btw. - I had forgotten to set the library in my install script - now it does that as well.
Btw. you might want to call
docker buildx prune -f
when you remove the old images.
Very good point.
If docker doesn't build image every time that isn't a problem in itself. Docker checks each step, builds only changed parts, combines the pieces and then calculates hash of resulting image (layer) - in your case image (layer) already exists with the same hash thus there is no need to create another one since it would be identical copy anyway - that's expected behavior.
The cache will cause issue eventually though. Since steps like apt install something or wget example.com/something-latest.tar.gz don't change in the Dockerfile for long time but their result will change at some point - thus if we use cache long enough - we will have such parts more a more out of date as time goes. Docker doesn't have ability to disable cache for such variable steps and the cache doesn't expire either. That's why we can't use cache in long term build - it's good when you running builds after each other for development purposes - then cache is great, but if we want to build image in the future then we can't use cache.
Yep ๐
I think I spotted an issue. First we remove the old image, and then we build the new one.
If the build fails for some reason, we are left with no image, and all the other projects will be unable to build until it is fixed.
Would it be better to store the current image id in e variable, build the image, and if that goes ok, it removes the old image. That way, we never remove the only image there is, and things keep working. This also ensure other projects can build in parallel with this project.
There is another error I made. The vyos-docker-container can't be in vyos-missing repository - then Jenkins watches if the script in vyos-docker-container changed and that's pointless, we need to watch vyos-build/docker instead.
Simple script with so many mistakes!
But it keeps getting better with every bugfix ๐
Location and the cleanup after successful build&push should be fixed now. The vyos-build-container
now lives in https://github.com/dd010101/vyos-build.git
- rest stays the same, jobs.json is updated.
Nice, I'll update my setup and try it again later today ๐
Status for today - I have automated the setup of the vyos-build-container job, and know how to make it wait for it, and print the status - but that's a job for sometime during the weekend.
I will close this one since we have the #20 issue to continue in.
I'm trying to do the setup, as part of making the install scripts.
Every build seems to skip amd64, with the message
Stage "amd64" skipped due to when conditional
The Build-In Node is configured with the labels:
Docker ec2_amd64 docker
The System environment vars are:Here is the console output for building vyos-1x for the sagitta branch: