edgexfoundry / edgex-compose

EdgeX Foundry Docker Compose release compose files and tools for building EdgeX compose files
Apache License 2.0
81 stars 115 forks source link

[bug] several containers in minnesota restart frequently #388

Closed LavenderQAQ closed 1 year ago

LavenderQAQ commented 1 year ago

🐞 Bug Report

Affected Services [REQUIRED]

The issue is located in: app-service-configurable, command, support-notification, support-scheduler, core-metadata ### Is this a regression? Yes, the previous version in which this bug was not present was: levski and all previous versions ### Description and Minimal Reproduction [**REQUIRED**] I deployed the latest version using docker-compose with the command: ```shell make run no-secty ``` ## πŸ”₯ Exception or Error



Many containers are in a state of constant restart:

CONTAINER ID   IMAGE                                         COMMAND                  CREATED        STATUS                  PORTS                                                                        NAMES
9bb8fe964f80   edgexfoundry/app-service-configurable:3.0.0   "/app-service-config…"   14 hours ago   Up 29 seconds           48095/tcp, 127.0.0.1:59701->59701/tcp                                        edgex-app-rules-engine
e4426ebbda4c   edgexfoundry/device-rest:3.0.0                "/device-rest --cp=c…"   14 hours ago   Up 58 seconds           127.0.0.1:59986->59986/tcp                                                   edgex-device-rest
0ced9d3f9e2b   edgexfoundry/device-virtual:3.0.0             "/device-virtual --c…"   14 hours ago   Up Less than a second   127.0.0.1:59900->59900/tcp                                                   edgex-device-virtual
26a67cae8242   edgexfoundry/core-data:3.0.0                  "/core-data -cp=cons…"   14 hours ago   Up 29 seconds           127.0.0.1:59880->59880/tcp                                                   edgex-core-data
1faa2e105738   edgexfoundry/core-command:3.0.0               "/core-command -cp=c…"   14 hours ago   Up 44 seconds           127.0.0.1:59882->59882/tcp                                                   edgex-core-command
0a6a22b9a751   edgexfoundry/support-notifications:3.0.0      "/support-notificati…"   14 hours ago   Up 14 seconds           127.0.0.1:59860->59860/tcp                                                   edgex-support-notifications
402748e680c7   lfedge/ekuiper:1.9.2-alpine                   "/usr/bin/docker-ent…"   14 hours ago   Up 14 hours             9081/tcp, 20498/tcp, 127.0.0.1:59720->59720/tcp                              edgex-kuiper
12eb1d3e17e4   edgexfoundry/support-scheduler:3.0.0          "/support-scheduler …"   14 hours ago   Up 44 seconds           127.0.0.1:59861->59861/tcp                                                   edgex-support-scheduler
43300a40e3a6   edgexfoundry/core-metadata:3.0.0              "/core-metadata -cp=…"   14 hours ago   Up 19 seconds           127.0.0.1:59881->59881/tcp                                                   edgex-core-metadata
12c124bc9809   hashicorp/consul:1.15.2                       "docker-entrypoint.s…"   14 hours ago   Up 14 hours             8300-8302/tcp, 8301-8302/udp, 8600/tcp, 8600/udp, 127.0.0.1:8500->8500/tcp   edgex-core-consul
c486850d9b07   redis:7.0.11-alpine                           "docker-entrypoint.s…"   14 hours ago   Up 14 hours             127.0.0.1:6379->6379/tcp                                                     edgex-redis
55b23383028a   edgexfoundry/edgex-ui:3.0.0                   "./edgex-ui-server -…"   14 hours ago   Up 14 hours             0.0.0.0:4000->4000/tcp, :::4000->4000/tcp                

🌍 Your Environment

Deployment Environment:

Linux 5.4.0-148-generic #165-Ubuntu SMP Tue Apr 18 08:53:12 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

EdgeX Version [REQUIRED]: minnesota

Anything else relevant?

LavenderQAQ commented 1 year ago

Most components report error:

msg="configuration provider is not available"
cloudxxx8 commented 1 year ago

everything works fine from my computer. it looks like you miss the conainer edgex-core-common-config-bootstrapper what is your steps? here is my steps:

  1. git clone a clean edgex-compose
  2. git checkout minnesota branch
  3. make run no-secty
LavenderQAQ commented 1 year ago

@cloudxxx8 My steps are exactly the same as yours, and I haven't changed any code. I just checked it out and found that edgex-core-common-config-bootstrapper unexpectedly quit:

0714c01e6b4d.  edgexfoundry/core-common-config-bootstrapper:3.0.0   "entrypoint.sh /core…"   About a minute ago   Exited (1) 44 seconds ago   edgex-core-common-config-bootstrapper

Here is its last log:

level=ERROR ts=2023-06-08T00:52:55.640858938Z app=core-common-config-bootstrapper source=main.go:116 msg="failed to determine if common configuration exists in the provider: checking configuration existence from Consul failed: Unexpected response code: 503 (<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">\n<html><head>\n<meta type=\"copyright\" content=\"Copyright (C) 1996-2017 The Squid Software Foundation and contributors\">\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">\n<title>ERROR: The requested URL could not be retrieved</title>\n<style type=\"text/css\"><!-- \n /*\n * Copyright (C) 1996-2017 The Squid Software Foundation and contributors\n *\n * Squid software is distributed under GPLv2+ license and includes\n * contributions from numerous individuals and organizations.\n * Please see the COPYING and CONTRIBUTORS files for details.\n */\n\n/*\n Stylesheet for Squid Error pages\n Adapted from design by Free CSS Templates\n http://www.freecsstemplates.org\n Released for free under a Creative Commons Attribution 2.5 License\n*/\n\n/* Page basics */\n* {\n\tfont-family: verdana, sans-serif;\n}\n\nhtml body {\n\tmargin: 0;\n\tpadding: 0;\n\tbackground: #efefef;\n\tfont-size: 12px;\n\tcolor: #1e1e1e;\n}\n\n/* Page displayed title area */\n#titles {\n\tmargin-left: 15px;\n\tpadding: 10px;\n\tpadding-left: 100px;\n\tbackground: url('/squid-internal-static/icons/SN.png') no-repeat left;\n}\n\n/* initial title */\n#titles h1 {\n\tcolor: #000000;\n}\n#titles h2 {\n\tcolor: #000000;\n}\n\n/* special event: FTP success page titles */\n#titles ftpsuccess {\n\tbackground-color:#00ff00;\n\twidth:100%;\n}\n\n/* Page displayed body content area */\n#content {\n\tpadding: 10px;\n\tbackground: #ffffff;\n}\n\n/* General text */\np {\n}\n\n/* error brief description */\n#error p {\n}\n\n/* some data which may have caused the problem */\n#data {\n}\n\n/* the error message received from the system or other software */\n#sysmsg {\n}\n\npre {\n    font-family:sans-serif;\n}\n\n/* special event: FTP / Gopher directory listing */\n#dirmsg {\n    font-family: courier;\n    color: black;\n    font-size: 10pt;\n}\n#dirlisting {\n    margin-left: 2%;\n    margin-right: 2%;\n}\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\n    border-bottom: groove;\n}\n#dirlisting td.size {\n    width: 50px;\n    text-align: right;\n    padding-right: 5px;\n}\n\n/* horizontal lines */\nhr {\n\tmargin: 0;\n}\n\n/* page displayed footer area */\n#footer {\n\tfont-size: 9px;\n\tpadding-left: 10px;\n}\n\n\nbody\n:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }\n:lang(he) { direction: rtl; }\n --></style>\n</head><body id=ERR_DNS_FAIL>\n<div id=\"titles\">\n<h1>ERROR</h1>\n<h2>The requested URL could not be retrieved</h2>\n</div>\n<hr>\n\n<div id=\"content\">\n<p>The following error was encountered while trying to retrieve the URL: <a href=\"http://edgex-core-consul:8500/v1/kv/edgex/v3/core-common-config-bootstrapper/?\">http://edgex-core-consul:8500/v1/kv/edgex/v3/core-common-config-bootstrapper/?</a></p>\n\n<blockquote id=\"error\">\n<p><b>Unable to determine IP address from host name <q>edgex-core-consul</q></b></p>\n</blockquote>\n\n<p>The DNS server returned:</p>\n<blockquote id=\"data\">\n<pre>Name Error: The domain name does not exist.</pre>\n</blockquote>\n\n<p>This means that the cache was not able to resolve the hostname presented in the URL. Check if the address is correct.</p>\n\n<p>Your cache administrator is <a href=\"mailto:webmaster?subject=CacheErrorInfo%20-%20ERR_DNS_FAIL&amp;body=CacheHost%3A%20oversea-squid1.jp.txyun%0D%0AErrPage%3A%20ERR_DNS_FAIL%0D%0AErr%3A%20%5Bnone%5D%0D%0ADNS%20ErrMsg%3A%20Name%20Error%3A%20The%20domain%20name%20does%20not%20exist.%0D%0ATimeStamp%3A%20Thu,%2008%20Jun%202023%2000%3A52%3A49%20GMT%0D%0A%0D%0AClientIP%3A%20172.28.205.128%0D%0A%0D%0AHTTP%20Request%3A%0D%0AGET%20%2Fv1%2Fkv%2Fedgex%2Fv3%2Fcore-common-config-bootstrapper%2F%3Fkeys%3D%20HTTP%2F1.1%0AUser-Agent%3A%20Go-http-client%2F1.1%0D%0AAccept-Encoding%3A%20gzip%0D%0AHost%3A%20edgex-core-consul%3A8500%0D%0A%0D%0A%0D%0A\">webmaster</a>.</p>\n<br>\n</div>\n\n<hr>\n<div id=\"footer\">\n<p>Generated Thu, 08 Jun 2023 00:52:49 GMT by oversea-squid1.jp.txyun (squid/3.5.27)</p>\n<!-- ERR_DNS_FAIL -->\n</div>\n</body></html>)"
cloudxxx8 commented 1 year ago

according to the error log, it is about consul, so you need to check the log from consul.

LavenderQAQ commented 1 year ago

@cloudxxx8 It's weird. consul's logs are working fine:

==> Starting Consul agent...
              Version: '1.15.2'
           Build Date: '2023-03-30 17:51:19 +0000 UTC'
              Node ID: '1afb88bb-2185-6dbb-0995-6d92aefa455e'
            Node name: 'edgex-core-consul'
           Datacenter: 'dc1' (Segment: '<all>')
               Server: true (Bootstrap: true)
          Client Addr: [0.0.0.0] (HTTP: 8500, HTTPS: -1, gRPC: -1, gRPC-TLS: 8503, DNS: 8600)
         Cluster Addr: 172.31.4.2 (LAN: 8301, WAN: 8302)
    Gossip Encryption: false
     Auto-Encrypt-TLS: false
            HTTPS TLS: Verify Incoming: false, Verify Outgoing: false, Min Version: TLSv1_2
             gRPC TLS: Verify Incoming: false, Min Version: TLSv1_2
     Internal RPC TLS: Verify Incoming: false, Verify Outgoing: false (Verify Hostname: false), Min Version: TLSv1_2

==> Log data will now stream in as it occurs:

2023-06-08T05:39:58.803Z [WARN]  agent: bootstrap = true: do not enable unless necessary
2023-06-08T05:39:58.809Z [WARN]  agent.auto_config: bootstrap = true: do not enable unless necessary
2023-06-08T05:39:59.041Z [INFO]  agent.server.raft: initial configuration: index=8273 servers="[{Suffrage:Voter ID:1afb88bb-2185-6dbb-0995-6d92aefa455e Address:172.31.3.4:8300}]"
2023-06-08T05:39:59.041Z [INFO]  agent.server.raft: entering follower state: follower="Node at 172.31.4.2:8300 [Follower]" leader-address= leader-id=
2023-06-08T05:39:59.042Z [INFO]  agent.server.serf.wan: serf: EventMemberJoin: edgex-core-consul.dc1 172.31.4.2
2023-06-08T05:39:59.042Z [WARN]  agent.server.serf.wan: serf: Failed to re-join any previously known node
2023-06-08T05:39:59.043Z [INFO]  agent.server.serf.lan: serf: EventMemberJoin: edgex-core-consul 172.31.4.2
2023-06-08T05:39:59.043Z [INFO]  agent.router: Initializing LAN area manager
2023-06-08T05:39:59.043Z [WARN]  agent.server.serf.lan: serf: Failed to re-join any previously known node
2023-06-08T05:39:59.043Z [INFO]  agent.server: Adding LAN server: server="edgex-core-consul (Addr: tcp/172.31.4.2:8300) (DC: dc1)"
2023-06-08T05:39:59.043Z [INFO]  agent.server.autopilot: reconciliation now disabled
2023-06-08T05:39:59.044Z [INFO]  agent.server: Handled event for server in area: event=member-join server=edgex-core-consul.dc1 area=wan
2023-06-08T05:39:59.046Z [INFO]  agent.server.cert-manager: initialized server certificate management
2023-06-08T05:39:59.047Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=tcp
2023-06-08T05:39:59.047Z [INFO]  agent: Started DNS server: address=0.0.0.0:8600 network=udp
2023-06-08T05:39:59.047Z [INFO]  agent: Starting server: address=[::]:8500 network=tcp protocol=http
2023-06-08T05:39:59.048Z [INFO]  agent: Started gRPC listeners: port_name=grpc_tls address=[::]:8503 network=tcp
2023-06-08T05:39:59.048Z [INFO]  agent: started state syncer
2023-06-08T05:39:59.048Z [INFO]  agent: Consul agent running!
2023-06-08T05:40:05.239Z [INFO]  agent: Newer Consul version available: new_version=1.15.3 current_version=1.15.2
2023-06-08T05:40:05.940Z [WARN]  agent.server.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2023-06-08T05:40:05.940Z [INFO]  agent.server.raft: entering candidate state: node="Node at 172.31.4.2:8300 [Candidate]" term=5
2023-06-08T05:40:05.948Z [INFO]  agent.server.raft: election won: term=5 tally=1
2023-06-08T05:40:05.948Z [INFO]  agent.server.raft: entering leader state: leader="Node at 172.31.4.2:8300 [Leader]"
2023-06-08T05:40:05.948Z [INFO]  agent.server: cluster leadership acquired
2023-06-08T05:40:05.948Z [INFO]  agent.server: New leader elected: payload=edgex-core-consul
2023-06-08T05:40:06.230Z [INFO]  agent.server.autopilot: reconciliation now enabled
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="federation state anti-entropy"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="federation state pruning"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="streaming peering resources"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="metrics for streaming peering resources"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="peering deferred deletion"
2023-06-08T05:40:06.231Z [INFO]  connect.ca: initialized primary datacenter CA from existing CARoot with provider: provider=consul
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="intermediate cert renew watch"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="CA root pruning"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="CA root expiration metric"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="CA signing expiration metric"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="virtual IP version check"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: started routine: routine="config entry controllers"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: stopping routine: routine="virtual IP version check"
2023-06-08T05:40:06.231Z [INFO]  agent.leader: stopped routine: routine="virtual IP version check"
2023-06-08T05:40:06.231Z [INFO]  agent.server.raft: updating configuration: command=AddVoter server-id=1afb88bb-2185-6dbb-0995-6d92aefa455e server-addr=172.31.4.2:8300 servers="[{Suffrage:Voter ID:1afb88bb-2185-6dbb-0995-6d92aefa455e Address:172.31.4.2:8300}]"
2023-06-08T05:40:06.235Z [INFO]  agent.server: member joined, marking health alive: member=edgex-core-consul partition=default
2023-06-08T05:40:17.606Z [INFO]  agent: Synced node info
LavenderQAQ commented 1 year ago

I redeployed after making clean and this problem didn't arise. This may be due to legacy configurations from previous versions.