alibaba / SREWorks

Cloud Native DataOps & AIOps Platform | 云原生数智运维平台
https://sreworks.cn
Apache License 2.0
1.79k stars 395 forks source link

快速安装完整版本,安装失败 #235

Open lkeai2007 opened 1 year ago

lkeai2007 commented 1 year ago

我执行的该命令,我的版本是centos8.6 helm install sreworks ./ --create-namespace --namespace sreworks --set global.accessMode="nodePort" --set appmanager.home.url="http://127.0.0.1:30767" --set appmanager.server.jwtSecretKey="123321" image

lkeai2007 commented 1 year ago

minikube start --image-mirror-country=cn --cpus=4 --memory=15gb ,并且该命令已经启动成功 image

lkeai2007 commented 1 year ago

错误日志:kubectl logs sreworks-appmanager-cluster-initjob-xf829 -n sreworks

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 790, in urlopen response = self._make_request( File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 496, in _make_request conn.request( File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 395, in request self.endheaders() File "/usr/local/lib/python3.9/http/client.py", line 1280, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/local/lib/python3.9/http/client.py", line 1040, in _send_output self.send(msg) File "/usr/local/lib/python3.9/http/client.py", line 980, in send self.connect() File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 243, in connect self.sock = self._new_conn() File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 218, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f803e342730>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 486, in send resp = conn.urlopen( File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 844, in urlopen retries = retries.increment( File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 515, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='sreworks-appmanager', port=80): Max retries exceeded with url: /oauth/token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f803e342730>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/app/sbin/cluster_init.py", line 165, in init_cluster(AppManagerClient(ENDPOINT, CLIENT_ID, CLIENT_SECRET, USERNAME, PASSWORD).client) File "/app/sbin/cluster_init.py", line 75, in init self._token = self._fetch_token() File "/app/sbin/cluster_init.py", line 86, in _fetch_token return oauth.fetch_token( File "/usr/local/lib/python3.9/site-packages/requests_oauthlib/oauth2_session.py", line 341, in fetch_token r = self.request( File "/usr/local/lib/python3.9/site-packages/requests_oauthlib/oauth2_session.py", line 521, in request return super(OAuth2Session, self).request( File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, kwargs) File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='sreworks-appmanager', port=80): Max retries exceeded with url: /oauth/token (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f803e342730>: Failed to establish a new connection: [Errno 111] Connection refused'))

Twwy commented 1 year ago

关键点在 sreworks-mysql-0 和 sreworks-redis-master-0 没有正常启动

Twwy commented 1 year ago

导致其他pod没法正常运行。

lkeai2007 commented 1 year ago

其他导致pod无法正常运行。

还是不行呀, image image

Twwy commented 1 year ago

从当前截图看已经mysql已经重启两次了,是否当前资源(比如内存),不是很充分,导致OOM反复重启?继而依赖mysql的服务都没法正常工作。

lkeai2007 commented 1 year ago

从当前截图看已经mysql已经重启两次了,是否当前资源(比如内存),不是很充分,导致OOM反复重启?继而依赖mysql的服务都没法正常工作。

虚拟机我给的26g内存

lkeai2007 commented 1 year ago

从当前截图看已经mysql已经重启两次了,是否当前资源(比如内存),不是很充分,导致OOM反复重启?继而依赖mysql的服务都没法正常工作。

是否要搭建私有docker仓库才能起来呢

Twwy commented 1 year ago

私有docker仓库不是必要条件。快速安装流程中,已经把所有镜像预置到公网可访问的docker镜像路径中。

Twwy commented 1 year ago

从当前截图看已经mysql已经重启两次了,是否当前资源(比如内存),不是很充分,导致OOM反复重启?继而依赖mysql的服务都没法正常工作。

虚拟机我给的26g内存

用kubectl describe 看一下mysql的最近几次重启的原因。

Twwy commented 1 year ago

https://www.yuque.com/sreworks-doc/docs/rr5g10 单机完整(数智版)部署: 建议至少 8核/32G内存/300G硬盘 可能内存确实差了一些。

lkeai2007 commented 1 year ago

https://www.yuque.com/sreworks-doc/docs/rr5g10 单机完整(数智版)部署: 建议至少 8核/32G内存/300G硬盘 可能内存确实差了一些。

[root@smu1 sreworks-chart]# kubectl describe pod sreworks-mysql-0 -n sreworks Name: sreworks-mysql-0 Namespace: sreworks Priority: 0 Service Account: sreworks-mysql Node: minikube/192.168.58.2 Start Time: Thu, 31 Aug 2023 16:18:00 +0800 Labels: app.kubernetes.io/component=primary app.kubernetes.io/instance=sreworks app.kubernetes.io/managed-by=Helm app.kubernetes.io/name=mysql controller-revision-hash=sreworks-mysql-7cc549768b helm.sh/chart=mysql-8.2.3 statefulset.kubernetes.io/pod-name=sreworks-mysql-0 Annotations: checksum/configuration: 7c4d261d8711bdefbd47de7a4939ed26a18abae82038d582783afe9c2b6cb39d Status: Running IP: 10.244.1.176 IPs: IP: 10.244.1.176 Controlled By: StatefulSet/sreworks-mysql Containers: mysql: Container ID: docker://94f5868b20d3a4ab5b13e9356733b1c2ff5187247318e6881198a1946e918a89 Image: sreworks-registry.cn-beijing.cr.aliyuncs.com/hub/mysql:v1.0 Image ID: docker-pullable://sreworks-registry.cn-beijing.cr.aliyuncs.com/hub/mysql@sha256:3ffd066da0331310857607aef3f025f688648fc36c3a0b46df0bda3081666dc8 Port: 3306/TCP Host Port: 0/TCP State: Running Started: Fri, 01 Sep 2023 15:19:55 +0800 Last State: Terminated Reason: Error Exit Code: 255 Started: Thu, 31 Aug 2023 16:25:00 +0800 Finished: Fri, 01 Sep 2023 15:18:08 +0800 Ready: True Restart Count: 3 Liveness: exec [/bin/bash -ec password_aux="${MYSQL_ROOT_PASSWORD:-}" if [[ -f "${MYSQL_ROOT_PASSWORD_FILE:-}" ]]; then password_aux=$(cat "$MYSQL_ROOT_PASSWORD_FILE") fi mysqladmin status -uroot -p"${password_aux}" ] delay=120s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [/bin/bash -ec password_aux="${MYSQL_ROOT_PASSWORD:-}" if [[ -f "${MYSQL_ROOT_PASSWORD_FILE:-}" ]]; then password_aux=$(cat "$MYSQL_ROOT_PASSWORD_FILE") fi mysqladmin status -uroot -p"${password_aux}" ] delay=30s timeout=1s period=10s #success=1 #failure=3 Environment: BITNAMI_DEBUG: false MYSQL_ROOT_PASSWORD: <set to the key 'mysql-root-password' in secret 'sreworks-mysql'> Optional: false MYSQL_DATABASE: my_database MYSQL_EXTRA_FLAGS: --max-connect-errors=1000 --max_connections=10000 Mounts: /bitnami/mysql from data (rw) /opt/bitnami/mysql/conf/my.cnf from config (rw,path="my.cnf") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-splgn (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: data-sreworks-mysql-0 ReadOnly: false config: Type: ConfigMap (a volume populated by a ConfigMap) Name: sreworks-mysql Optional: false kube-api-access-splgn: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 23h default-scheduler Successfully assigned sreworks/sreworks-mysql-0 to minikube Normal Created 23h kubelet Created container mysql Normal Started 23h kubelet Started container mysql Warning Unhealthy 23h kubelet Readiness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure. mysqladmin: connect to server at 'localhost' failed error: 'Access denied for user 'root'@'localhost' (using password: YES)' Normal Killing 23h kubelet Container mysql failed liveness probe, will be restarted Normal Pulled 23h (x2 over 23h) kubelet Container image "sreworks-registry.cn-beijing.cr.aliyuncs.com/hub/mysql:v1.0" already present on machine Warning Unhealthy 22h (x40 over 23h) kubelet Liveness probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out Warning Unhealthy 22h (x68 over 23h) kubelet Readiness probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out Normal SandboxChanged 23m kubelet Pod sandbox changed, it will be killed and re-created. Normal Pulled 22m kubelet Container image "sreworks-registry.cn-beijing.cr.aliyuncs.com/hub/mysql:v1.0" already present on machine Normal Created 22m kubelet Created container mysql Normal Started 22m kubelet Started container mysql Warning Unhealthy 21m (x3 over 22m) kubelet Readiness probe failed: mysqladmin: [Warning] Using a password on the command line interface can be insecure. mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/opt/bitnami/mysql/tmp/mysql.sock' (2)' Check that mysqld is running and that the socket: '/opt/bitnami/mysql/tmp/mysql.sock' exists! Warning Unhealthy 12m (x10 over 20m) kubelet Liveness probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out Warning Unhealthy 3m11s (x17 over 22m) kubelet Readiness probe failed: command "/bin/bash -ec password_aux=\"${MYSQL_ROOT_PASSWORD:-}\"\nif [[ -f \"${MYSQL_ROOT_PASSWORD_FILE:-}\" ]]; then\n password_aux=$(cat \"$MYSQL_ROOT_PASSWORD_FILE\")\nfi\nmysqladmin status -uroot -p\"${password_aux}\"\n" timed out

stdnt-xiao commented 1 year ago

你的集群挂载的存储卷可能出现了兼容问题。可以尝试取消mysql的data磁盘挂载测试。 但这不是最终的解决办法,理论上程序应考虑到兼容不同的存储系统。

sixinshuier commented 9 months ago

image image 同样的问题

Twwy commented 9 months ago

image image 同样的问题

您这个问题,不太一样,你这边的mysql已经正常运行没有重启。您看一下 sreworks-appmanager-postrun-8nnnv 这个pod的日志,看看postrun是什么没有运行成功。

sixinshuier commented 9 months ago

image 一样的,还是连接sreworks-appmanager 超时. sreworks-appmanager-server 有问题,数据库初始化有问题