JetBrains / teamcity-docker-agent

TeamCity agent docker image sources
https://hub.docker.com/r/jetbrains/teamcity-agent/
Apache License 2.0
77 stars 64 forks source link

Agent doesn't start #5

Closed 90K2 closed 7 years ago

90K2 commented 7 years ago

Hello. I have some problem with starting buildAgent. Inside the container begins infinity loop after start.

That is how i start container: docker run -itd -e SERVER_URL="http://teamcity.domain.com" -v /opt/tc_agent/:/data/teamcity_agent/conf -p 0.0.0.0:9090:9090 jetbrains/teamcity-minimal-agent

Seems that each loop cycle finishing with

[2017-09-02 15:39:59,524] INFO - jetbrains.buildServer.AGENT - Upgrade mode: jetbrains.buildServer.agent.impl.upgrade.modes.FullUpgradeMode [2017-09-02 15:39:59,533] INFO - rocesses.ProcessTreeTerminator - Using jetbrains.buildServer.processes.ProcessTreeTerminatorLinux [2017-09-02 15:39:59,534] INFO - .ProcessTreeTerminatorImplBase - Will use command 'sh -c echo $$ && ps awwxo pid,ppid,command | tee'. [2017-09-02 15:39:59,564] INFO - .ProcessTreeTerminatorImplBase - Will use command 'sh -c echo $$ && ps awwxo pid,ppid,command | tee'. [2017-09-02 15:39:59,588] INFO - ses.ProcessTreeTerminatorLinux - Second thread id is 246 [2017-09-02 15:39:59,588] INFO - ses.ProcessTreeTerminatorLinux - Thread is Process thread model: false [2017-09-02 15:39:59,589] INFO - .ProcessTreeTerminatorImplBase - Collecting processes from the current one, current process PID 246 [2017-09-02 15:39:59,589] INFO - .ProcessTreeTerminatorImplBase - No processes to kill [2017-09-02 15:39:59,590] INFO - jetbrains.buildServer.AGENT - Exit for upgrade [2017-09-02 15:39:59,590] INFO - ent.impl.upgrade.AgentExitCode - Agent exited. Upgrade process [2017-09-02 15:39:59,591] INFO - buildServer.agent.AgentMain2$2 - Closing jetbrains.buildServer.agent.AgentMain2$2@3cef309d: startup date [Sat Sep 02 15:39:41 UTC 2017]; root of context hierarchy

And next lines again from begin

[2017-09-02 15:39:03,723] INFO - s.buildServer.agent.AgentMain2 - =========================================================== [2017-09-02 15:39:03,738] INFO - s.buildServer.agent.AgentMain2 - TeamCity Build Agent 2017.1.3 (build 46961) [2017-09-02 15:39:03,745] INFO - s.buildServer.agent.AgentMain2 - OS: Linux, version 3.10.0-514.16.1.el7.x86_64, amd64, Current user: root, Time zone: UTC [2017-09-02 15:39:03,746] INFO - s.buildServer.agent.AgentMain2 - Java: 1.8.0_131, Java HotSpot(TM) 64-Bit Server VM (25.131-b11, mixed mode), Java(TM) SE Runtime Environment (1.8.0_131-b11), Oracle Corporation; JVM parameters: -ea -Xmx384m -Dteamcity_logs=../logs/

What i'm doing wrong? Thanks

VladRassokhin commented 7 years ago

At least single upgrade is normal since agent has to download plugins from server. Though constant upgrade may be caused by many reasons.

Please provide full agent logs, they could be found in /opt/buildagent/logs inside instance. You could either share them as gist or send to our feedback email.

WhatAKitty commented 7 years ago

@VladRassokhin I had same error.

this is my logfile: teamcity-agent.log.zip

VladRassokhin commented 7 years ago

Looks like while agent exits for restart (performing upgrade) some other process starts it again, and history repeats

WhatAKitty commented 7 years ago

@VladRassokhin Can you help me? Why it happened.

pavelsher commented 7 years ago

Please attach all log files from this agent.

WhatAKitty commented 7 years ago

@pavelsher hey, thanks for your helping, this is the log the team agent generated.

logs.zip

pavelsher commented 7 years ago

Relevant part of the log: [2017-09-05 15:00:26,925] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/spring-scripting [2017-09-05 15:00:26,925] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/spring-scripting/spring-scripting-bsh.jar [2017-09-05 15:00:26,925] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/spring-scripting/spring-scripting-groovy.jar [2017-09-05 15:00:26,926] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/spring-scripting/spring-scripting-jruby.jar [2017-09-05 15:00:27,927] ERROR - buildServer.agent.LauncherUtil - Error deleting file: /opt/buildagent/lib/spring-scripting [2017-09-05 15:00:27,927] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/spring.jar [2017-09-05 15:00:27,928] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/trove-3.0.3.jar [2017-09-05 15:00:27,929] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/trove4j.jar [2017-09-05 15:00:27,929] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/util.jar [2017-09-05 15:00:27,929] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/xercesImpl.jar [2017-09-05 15:00:27,931] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/xml-rpc-wrapper.jar [2017-09-05 15:00:27,931] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/xmlrpc-2.0.1.jar [2017-09-05 15:00:27,932] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/xpp3-1.1.4c.jar [2017-09-05 15:00:27,932] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/xstream-1.4.8-custom.jar [2017-09-05 15:00:27,932] DEBUG - buildServer.agent.LauncherUtil - Deleting /opt/buildagent/lib/xz-1.5.jar [2017-09-05 15:00:27,933] ERROR - ver.agent.upgrade.AgentUpgrade - Upgrade failed :Problems deleting files under /opt/buildagent/lib. Check logs above for details java.io.IOException: Problems deleting files under /opt/buildagent/lib. Check logs above for details at jetbrains.buildServer.agent.upgrade.UpgradeFolder.doUpgrade(UpgradeFolder.java:58) at jetbrains.buildServer.agent.upgrade.UpgradeFolder.upgrade(UpgradeFolder.java:44) at jetbrains.buildServer.agent.upgrade.AgentUpgrade.doUpgrade(AgentUpgrade.java:110) at jetbrains.buildServer.agent.upgrade.AgentUpgrade.applyUpdates(AgentUpgrade.java:82) at jetbrains.buildServer.agent.upgrade.UpgradeRunBase.run(UpgradeRunBase.java:41) at jetbrains.buildServer.agent.upgrade.UpgradeMode$5.run(UpgradeMode.java:133) at jetbrains.buildServer.agent.upgrade.Upgrade2$1.lockHeld(Upgrade2.java:57) at jetbrains.buildServer.agent.upgrade.Upgrade2$1.lockHeld(Upgrade2.java:55) at jetbrains.buildServer.agent.lock.impl.misc.MockLock.assumeLocked(MockLock.java:22) at jetbrains.buildServer.agent.lock.impl.misc.MockLock.tryLock(MockLock.java:14) at jetbrains.buildServer.agent.upgrade.Upgrade2.main3(Upgrade2.java:54) at jetbrains.buildServer.agent.upgrade.Upgrade2.main2(Upgrade2.java:27) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at jetbrains.buildServer.agent.ClazzLoader.call(ClazzLoader.java:59) at jetbrains.buildServer.agent.ClazzLoader.callMain2(ClazzLoader.java:19) at jetbrains.buildServer.agent.upgrade.Upgrade.main(Upgrade.java:16)

So for some reason agent was not able to remove /opt/buildagent/lib/spring-scripting directory during the upgrade and this caused infinite loop. We're not sure how to reproduce it yet. If there is anything special about environment where you start the agent, please let us know.

WhatAKitty commented 7 years ago

@pavelsher this is my docker info:

[root@docker 2017-09-11]# docker info
Containers: 7
 Running: 6
 Paused: 0
 Stopped: 1
Images: 75
Server Version: 17.06.2-ce
Storage Driver: overlay
 Backing Filesystem: xfs
 Supports d_type: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-514.26.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 17.06GiB
Name: docker.whatakitty.com
ID: 7MWP:LUVR:HVLD:ER24:5MBF:BVRI:ACCK:LBO7:6VMX:FLPC:YGQR:HJVV
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
 127.0.0.0/8
Registry Mirrors:
 https://t4xli0ji.mirror.aliyuncs.com/
Live Restore Enabled: false

WARNING: overlay: the backing xfs filesystem is formatted without d_type support, which leads to incorrect behavior.
         Reformat the filesystem with ftype=1 to enable d_type support.
         Running without d_type support will not be supported in future releases.

the agent start command is :

teamcity-build-agent:
    image: jetbrains/teamcity-agent
    restart: always
    depends_on:
      - teamcity-server
    environment:
      - SERVER_URL=http://teamcity-server:8111
    networks:
      - teamcity_net
    volumes:
      - /var/teamcity/agent/conf:/data/teamcity_agent/conf
pavelsher commented 7 years ago

We reproduced problem on Cent OS: https://youtrack.jetbrains.com/issue/TW-51501

pavelsher commented 7 years ago

So the problem is not TeamCity specific, seems there is some bug in CentOS itself. Some related discussions: https://github.com/moby/moby/issues/27214 It looks like newer version of CentOS should not have this problem. Also reportedly problem does not reproduce in case of ext4 file system. Anyway, there is nothing to fix on our side.

thehufbro commented 6 years ago

For anyone who runs into this... Docker tries to use the default storage driver ("overlay2", or "overlay") If you are stuck in an upgrade loop with issue's on deleting files you can try changing the storage driver for docker to either vfs or devicemapper, but vfs is recommended over devicemapper, since it is known to be slow. as Per: https://docs.docker.com/engine/userguide/storagedriver/selectadriver/#docker-ee-and-cs-engine

Another option, as noted in this post: https://github.com/moby/moby/issues/31445 referencing this: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/7.2_Release_Notes/technology-preview-file_systems.html

"this problem occurs if d_type is not available, due to the backing xfs filesystem not being formatted with ftype=1"

GokulendraPanda commented 5 years ago

Yes it is reproducible in Cent OS