DTStack / chengying

一款支持标准化schema定义、自动化部署产品包的软件,旨在对产品包下每个服务进行部署、升级、卸载、配置等操作,解放人工运维成本。
Apache License 2.0
201 stars 69 forks source link

部署摘要: agent异常退出:agent run error(unexpected):stop supervisor: e8771b72-8e28-49d8-ab25-3b9ff4cbf37b #40

Closed danny-zhu closed 1 year ago

danny-zhu commented 1 year ago

项目打包部署过程中出现如下的错误提示: 部署摘要: agent异常退出:agent run error(unexpected):stop supervisor: e8771b72-8e28-49d8-ab25-3b9ff4cbf37b

schema.yml内容如下:

parent_product_name: top-lab
product_name: top-lab
product_name_display: top-lab
product_version: 1.0.0

service:

  TopLab:
    service_display: top-lab
    version: 1.0.1
    group: top-ai
    config:
      service_port: 8097
      self_ip: ${@TopLab}
      top_lab_ip_port: ${self_ip}:${service_port} #self service's node ip
      # java_opts: "-Xms256m -Xmx1024m -Dundertow.port=${service_port} -Dundertow.host=0.0.0.0"

    instance:
      cmd: ./start.sh start
      update_recreate: true
#      environment:
#        JAVA_OPTS: ${java_opts}
#      config_paths:
#      - config/jboot.properties
#      - config/jboot-dev.properties
      healthcheck:
        shell: curl http://${top_lab_ip_port}/healthcheck
        #period: 30s #default 60s
        start_period: 30s #default 10s
        timeout: 10s #default 10s
        retries: 3 #default 1
      max_replica: 1
      start_after_install: false
      #post_deploy: chown 0644 dtlog && zkcreate node xxx --ip ${@es}
      #post_undeploy: rm -rf /var/data/dtlog
      logs:
      - logs/output.log

承影matrix日志如下:

MATRIX-DEBUG:2023/09/13 14:07:22 cluster.go:3875: GetClusterProductList: /api/v2/cluster/products
MATRIX-DEBUG:2023/09/13 14:07:24 product_line.go:308: [ProductLine->ProductListOfProductLine] ProductListOfProductLine from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1578: [ProductName] ProductName from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product_line.go:308: [ProductLine->ProductListOfProductLine] ProductListOfProductLine from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:26 product.go:9035: [Product->CheckDeployCondition] CheckDeployCondition from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:26 cluster.go:520: RoleInfo: /api/v2/cluster/hosts/role_info?cluster_id=2
MATRIX-ERROR:2023/09/13 14:07:26 product.go:2204: [Product->ServiceGroup] handleUncheckedServicesCore warn: /var/jenkins_home/jobs/em-rel/jobs/chengying_release/workspace/chengying/chengying-server/matrix/api/impl/product.go:5156: unchecked service `` not exist
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:5218: deploy product_name:top-lab, product_version: 1.0.0, userId: 1, clusterId: 2
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4631: cluster 2 installing new instance and rolling update ...
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4492: cluster 2 rollingUpdateCore TopLab ...
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4331: cluster 2 found TopLab old instance ip: [192.168.14.6]
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4346: found TopLab new instance ip: []
MATRIX-ERROR:2023/09/13 14:07:27 product.go:5228: delete notify event error: sql: no rows in result set
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:164: [AgentClient] AgentStop with params:e8771b72-8e28-49d8-ab25-3b9ff4cbf37b
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:77: [AgentRestCore]LoopAgentRestCore: 1, request uri: /api/v1/agent/e8771b72-8e28-49d8-ab25-3b9ff4cbf37b/stopSync
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:81: [AgentRestCore]LoopAgentRestCore: 1, response body: &{ 0 <nil>}
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:154: [AgentClient] AgentStart with params:&{e8771b72-8e28-49d8-ab25-3b9ff4cbf37b 0 0 0 map[]}
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:77: [AgentRestCore]LoopAgentRestCore: 1, request uri: /api/v1/agent/e8771b72-8e28-49d8-ab25-3b9ff4cbf37b/startSyncWithParam
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:81: [AgentRestCore]LoopAgentRestCore: 1, response body: &{ 0 <nil>}
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:3622: waiting instance(15) GetStatusChan...
MATRIX-DEBUG:2023/09/13 14:07:35 instance.go:528: ExecShellList GetBySeq error: 0
MATRIX-ERROR:2023/09/13 14:07:35 product.go:3643: agent异常退出:agent run error(unexpected):stop supervisor: e8771b72-8e28-49d8-ab25-3b9ff4cbf37b
MATRIX-DEBUG:2023/09/13 14:07:35 product.go:3624: end instance(15) GetStatusChan
MATRIX-DEBUG:2023/09/13 14:07:35 instancer.go:864: [Instancer] Clear
MATRIX-DEBUG:2023/09/13 14:07:35 product.go:4498: rollingUpdateCore TopLab finish(cluster 2 some instance of TopLab update fail)
MATRIX-ERROR:2023/09/13 14:07:35 product.go:4634: 462d4cd3-3814-4fc1-b47c-2e699d02b45e update error: cluster 2 some instance of TopLab update fail
MATRIX-ERROR:2023/09/13 14:07:35 product.go:4554: sql: no rows in result set
MATRIX-ERROR:2023/09/13 14:07:35 product.go:4610: sql: no rows in result set
MATRIX-DEBUG:2023/09/13 14:07:22 cluster.go:3875: GetClusterProductList: /api/v2/cluster/products
MATRIX-DEBUG:2023/09/13 14:07:24 product_line.go:308: [ProductLine->ProductListOfProductLine] ProductListOfProductLine from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1578: [ProductName] ProductName from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product_line.go:308: [ProductLine->ProductListOfProductLine] ProductListOfProductLine from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:2003: Service: /api/v2/product/top-lab/version/1.0.0/service
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:24 product.go:1741: [Product->ProductUncheckedServices] get unchecked services from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:26 product.go:9035: [Product->CheckDeployCondition] CheckDeployCondition from EasyMatrix API
MATRIX-DEBUG:2023/09/13 14:07:26 cluster.go:520: RoleInfo: /api/v2/cluster/hosts/role_info?cluster_id=2
MATRIX-ERROR:2023/09/13 14:07:26 product.go:2204: [Product->ServiceGroup] handleUncheckedServicesCore warn: /var/jenkins_home/jobs/em-rel/jobs/chengying_release/workspace/chengying/chengying-server/matrix/api/impl/product.go:5156: unchecked service `` not exist
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:5218: deploy product_name:top-lab, product_version: 1.0.0, userId: 1, clusterId: 2
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4631: cluster 2 installing new instance and rolling update ...
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4492: cluster 2 rollingUpdateCore TopLab ...
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4331: cluster 2 found TopLab old instance ip: [192.168.14.6]
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:4346: found TopLab new instance ip: []
MATRIX-ERROR:2023/09/13 14:07:27 product.go:5228: delete notify event error: sql: no rows in result set
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:164: [AgentClient] AgentStop with params:e8771b72-8e28-49d8-ab25-3b9ff4cbf37b
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:77: [AgentRestCore]LoopAgentRestCore: 1, request uri: /api/v1/agent/e8771b72-8e28-49d8-ab25-3b9ff4cbf37b/stopSync
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:81: [AgentRestCore]LoopAgentRestCore: 1, response body: &{ 0 <nil>}
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:154: [AgentClient] AgentStart with params:&{e8771b72-8e28-49d8-ab25-3b9ff4cbf37b 0 0 0 map[]}
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:77: [AgentRestCore]LoopAgentRestCore: 1, request uri: /api/v1/agent/e8771b72-8e28-49d8-ab25-3b9ff4cbf37b/startSyncWithParam
MATRIX-DEBUG:2023/09/13 14:07:27 agent-client.go:81: [AgentRestCore]LoopAgentRestCore: 1, response body: &{ 0 <nil>}
MATRIX-DEBUG:2023/09/13 14:07:27 product.go:3622: waiting instance(15) GetStatusChan...
MATRIX-DEBUG:2023/09/13 14:07:35 instance.go:528: ExecShellList GetBySeq error: 0
MATRIX-ERROR:2023/09/13 14:07:35 product.go:3643: agent异常退出:agent run error(unexpected):stop supervisor: e8771b72-8e28-49d8-ab25-3b9ff4cbf37b
MATRIX-DEBUG:2023/09/13 14:07:35 product.go:3624: end instance(15) GetStatusChan
MATRIX-DEBUG:2023/09/13 14:07:35 instancer.go:864: [Instancer] Clear
MATRIX-DEBUG:2023/09/13 14:07:35 product.go:4498: rollingUpdateCore TopLab finish(cluster 2 some instance of TopLab update fail)
MATRIX-ERROR:2023/09/13 14:07:35 product.go:4634: 462d4cd3-3814-4fc1-b47c-2e699d02b45e update error: cluster 2 soMATRIX-DEBUG:2023/09/13 14:07:52 cluster_status_monitor.go:48: StartClusterStatusM ...
wangqi811 commented 1 year ago

要确保./start.sh start进程不会退出,需要后台运行将这个进程hang住,不然进程就会退出,参考这个命令 image

danny-zhu commented 1 year ago

cmd改成了 ./start.sh start > logs/output.log 2>&1 以后,执行却报找不到 start.sh文件。。。

exec start err: run agent c0201df1-f936-4d76-9c9f-5557e98ebeeb error: fork/exec ./start.sh: no such file or directory

参考官方给的mysql shema定义执行文件的目录结构也是用的相对路径,我的执行文件在TopLab的根目录下,文件路径没问题 image

会不会和主机初始化失败有关系呢?添加主机的时候,报主机初始化失败 image

wangqi811 commented 1 year ago

初始化失败不影响,我用你的schema.yml本地做了一个简单的包,是没问题的 image 目录这样 image start.sh就一个简单的tail -f - image 所以你这个问题可能得自己分析一下了

danny-zhu commented 1 year ago

好的,感谢回复,请问你的linux os是什么版本的?用的什么用户启动的chengying,部署时有指定用户吗?

初始化失败不影响,我用你的schema.yml本地做了一个简单的包,是没问题的 image 目录这样 image start.sh就一个简单的tail -f - image 所以你这个问题可能得自己分析一下了

geyaandy commented 1 year ago

我也经常遇到这个错误,这个错误的原因还是要去部署主机agent日志里查看具体原因,/opt/dtstack/easymanager/easyagent/logs/, 我之前遇到是健康检查原因,服务进程要卡住才行

danny-zhu commented 1 year ago

我想可能是os版本的问题,换成centos7.9,主机添加初始化正常,并且没有报找不到start.sh的错误了。

geyaandy commented 1 year ago

  你好,邮件我已收到~!祝你的生活越来越好..........