conan-io / conan

Conan - The open-source C and C++ package manager
https://conan.io
MIT License
8.13k stars 968 forks source link

Random Git clone failures running in Jenkins Artifactory plugin #6400

Closed radonish closed 7 months ago

radonish commented 4 years ago

Issue

In my integration of Conan with Jenkins I am seeing random Git clone failures: fatal: unable to connect to cache daemon: Stale file handle

The failures occur in 2 use cases:

  1. conan config install <git repo>
  2. conanfile.py source() methods that clone source from a git repo

I worked around 1. by simply cloning the conan settings repo via Jenkins pipeline script code and then doing the conan config install by pointing to the already-cloned repo directory. Something like this:

        stage("Pull Source"){
            steps {
                dir("conan_settings") {
                    git branch: params.conf_repo_branch, credentialsId: 'jenkins', url: "${bitbucket}/conan_settings.git"
                }
            }

Working around 2. in the same way isn't quite as easy as it potentially impacts any of my package recipes that grab source from git.

Environment Details

Steps to reproduce

Jenkins code calling the Artifactory plugin Conan client makes the issue happen when the conanfile.py's source method is called; client call looks something like this:

client.run(command: "create . " + user + "/" + params.branch + " --profile " + params.profile + " --lockfile " + params.lock_file_name)

Logs (Executed commands with output) (Include/Attach if Applicable)

Jenkins:

15:19:54  Cross-build from 'Linux:x86_64' to 'Linux:armv5te'
15:19:54  libA/1.0.0@user/master: Configuring sources in /devops_jenkins/workspace/libA_master/conan_home/.conan/data/libA/1.0.0/user/master/source
15:19:54  ERROR: libA/1.0.0@user/master: Error in source() method, line 29
15:19:54    git.clone("https://XXX:8443/bitbucket/scm/mygroup/myrepo.git", "master")
15:19:54    CalledProcessErrorWithStderr: Command 'git -c http.sslVerify=true clone "https://XXX:8443/bitbucket/scm/mygroup/myrepo.git" . --branch master  ' returned non-zero exit status 128.
15:19:54  Cloning into '.'...
15:19:54  fatal: unable to connect to cache daemon: Stale file handle
uilianries commented 4 years ago

Hi @radonish !

It seems be a permission error in your environment. Did you try:

sudo chown $(whoami) ~/.git-credential-cache/socket

Or

sudo chown $(whoami) ~/.cache/git/credential/socket 
radonish commented 4 years ago

Hello, @uilianries

I logged into a couple of the Jenkins builder machines - 1 that had successfully git cloned and completed the build, 1 that had failed to git clone with the error I posted. Both machines had the same ownership/privileges for ~/.cache/git/credential/socket (for the builder user account).

The failure seems to be random and only occurs if it's a git clone initiated by a Conan call. Like I said, I made the conan config install random failure go away by removing Conan from the equation and doing the git clone via the Jenkins pipeline script instead.

The path to the specific git version we're using is in the Jenkins environment to avoid using the very old RHEL 7.4 version installed by default. I have confirmation that Jenkins is using that git version - is there a way to know what version of git is ultimately getting used from within Conan?

memsharded commented 4 years ago

Hi @radonish

Is it also possible to check if the combination of calling Conan from Jenkins could be a thing? Is is possible to reproduce the error from the Jenkins machine, but calling directly Conan (not from Jenkins process)? Just to rule out the interaction between both.

Also, it might help to know which Conan are you installing. From pip? Some installer? (we have had some weird issues with pyinstaller generated packages and ssh errors).

radonish commented 4 years ago

@memsharded, Conan was installed via pip.

I will write a script to repeatedly do the conan create for one of my packages that gets the source from git to see if it ever happens outside of the Jenkins use case.

Thank you

radonish commented 4 years ago

@memsharded, I've had the test (calling conan create directly via a shell script, with the same configuration and Jenkins user account) running for many hours today and I have yet to see a single git clone failure.

So, the random clone failures within the recipe's source() method only seem to happen when the conan create is executed via the Jenkins Artifactory plugin.

memsharded commented 4 years ago

@radonish that is a fantastic investigation on this issue, thanks very much.

I have been having a look to the source code of the Artifactory plugin (https://github.com/jfrog/jenkins-artifactory-plugin), and cannot find a line that could be related to this random error.

Maybe not seeing failures in conan config install moved to the raw stage (not via the Artifactory plugin) was coincidence/luck? What do you think? Is it possible to run the same conan create command a bunch of times inside Jenkins, but not using the Artifactory plugin code?

My guess is that someone in Jenkins or Artifactory-plugin is accessing concurrently or close, to the git credentials demon.

Maybe @eyalbe4 could have some experience with this?

radonish commented 4 years ago

@memsharded - Yes, I'm now thinking that there is an early error that happens and impacts future git activity, like the git clones Conan is doing.

Before the Artifactory plugin/Conan git activity occurs, the following git activity happens:

I can keep this issue open for now (and provide updates for any other Jenkins/Conan users) if you'd like or close since it's looking more like a Jenkins-side issue.

Thanks for all of your help!

memsharded commented 4 years ago

Sure, lets keep this open for a while, please keep us posted around about new findings. Thanks to you for all the feedback!

radonish commented 4 years ago

To follow up - in case anyone else is using Jenkins and Conan together - the issue seems to be related to the following Jenkins' components:

My usage of Jenkins/Conan for this problem case can be summarized as follows:

  1. [Jenkins] Git plugin clones a repo that has a Conan package recipe
  2. [Jenkins] Pipeline calls conan create (via Artifactory plugin) for a Conan package
  3. [Conan] Package creation involves the cloning of a different git repo that has necessary source code

The issue is that the git repo in step 3. requires authentication. It appears that I have roughly a 50/50 chance for the git credentials cached in step 1. (by way of the Credentials Binding plugin) to carry over to step 3.

I was able to verify this was the cause by simply disabling authentication on the git repo and the git clone in step 3. started succeeding 100% of the time.

radonish commented 4 years ago

My advice for anyone who is using Jenkins/Conan with git repos that require authentication is to use SSH instead of HTTPS. In my case, I simply created SSH keys and updated our Bitbucket server with the public key of our Jenkins build account. Then, I updated the Conan recipes that were cloning to get their source code to use the SSH address instead of HTTPS.

Problem solved.

memsharded commented 4 years ago

Hi @radonish

I think this is really useful information. I have submitted a note to the conan docs: https://github.com/conan-io/docs/pull/1548, maybe that could help other users to avoid this.

I don't know if we could do something else, we don't have the expertise in Jenkins and those plugins to investigate the root cause...

jasal82 commented 3 years ago

The git credential cache has a default timeout of 900 seconds, so after that interval the cached credentials will be deleted again. Maybe that's the problem? Try to increase the timeout setting in .gitconfig.

Just for the sake of completeness: We also had problems getting Git checkout to work in Windows Docker containers. There it helps to inject the credentials into the Windows cred store by calling

cmdkey /generic:git:https://some.git.repo.net /user:${username} /pass:${password}
memsharded commented 7 months ago

Jenkins plugin is discontinuing support, because it provided relatively low value, and not even ourselves (massive Jenkins users) are using the plugin, most users are just calling Conan directly from Jenkins without the plugin. Closing as non relevant.