badtuxx / giropops-monitoring

Full stack tools for monitoring containers and other stuff. ;)
https://youtube.com/linuxtips
Apache License 2.0
1.33k stars 267 forks source link

Mano meu container do alertmanagger sobe fica 10 segundos no ar e morre... #12

Closed danielprietsch closed 5 years ago

danielprietsch commented 5 years ago

eu acho que teve algum problema pois subi o swarm com o comando

docker swarm init --advertise-addr 192.168.15.9 pois ele dizia que tinha muita interface definida na minha wlp6s0 ... provavelmente por causa do monte de bridge feita no vagrant.... lll vou tentar reiniciar a interface sem resquícios do vagrant e virtualbox e fazer tudo de novo...

vou tentar de novo aqui ajustar a interface pro docker swarm init tentar subir sem precisar parametros e dou retorno aqui...

aproveito pra dizer que seus vídeos e didática são fodásticos demais tens uns fã aqui... #Vaiiiiiii

fica essas porra sem parar no syslog

Nov 5 02:52:39 daniel-dell kernel: [62035.773260] docker_gwbridge: port 9(vethc06ff52) entered blocking state Nov 5 02:52:39 daniel-dell kernel: [62035.773271] docker_gwbridge: port 9(vethc06ff52) entered forwarding state Nov 5 02:52:41 daniel-dell dockerd[28369]: time="2018-11-05T02:52:41-02:00" level=info msg="shim reaped" id=ee980ffb213df16e6c5fac0482d39e543cacd0bd299326084f7ffbd838bf3542 Nov 5 02:52:41 daniel-dell dockerd[28369]: time="2018-11-05T02:52:41.721364798-02:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="events.TaskDelete" Nov 5 02:52:43 daniel-dell kernel: [62040.181280] docker_gwbridge: port 9(vethc06ff52) entered disabled state Nov 5 02:52:43 daniel-dell kernel: [62040.366178] docker_gwbridge: port 9(vethc06ff52) entered disabled state Nov 5 02:52:43 daniel-dell kernel: [62040.378525] docker_gwbridge: port 9(vethc06ff52) entered disabled state Nov 5 02:52:43 daniel-dell NetworkManager[909]: [1541393563.7706] device (vethc06ff52): released from master device docker_gwbridge Nov 5 02:52:46 daniel-dell dockerd[28369]: time="2018-11-05T02:52:46.061814663-02:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]" Nov 5 02:52:46 daniel-dell dockerd[28369]: time="2018-11-05T02:52:46.061936619-02:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]" Nov 5 02:52:46 daniel-dell dockerd[28369]: time="2018-11-05T02:52:46.166695506-02:00" level=error msg="fatal task error" error="task: non-zero exit (1)" module=node/agent/taskmanager node.id=xgxtzyhszhdrrwld0o9u6ui64 service.id=27gabhyf7ib95nji6bfb5uxon task.id=eocv1jyhygjk8n7ucvxpptr7e Nov 5 02:52:46 daniel-dell kernel: [62043.068849] docker_gwbridge: port 5(veth3a1c8a0) entered blocking state Nov 5 02:52:46 daniel-dell kernel: [62043.068861] docker_gwbridge: port 5(veth3a1c8a0) entered disabled state Nov 5 02:52:46 daniel-dell kernel: [62043.127848] docker_gwbridge: port 5(veth3a1c8a0) entered blocking state Nov 5 02:52:46 daniel-dell kernel: [62043.127857] docker_gwbridge: port 5(veth3a1c8a0) entered forwarding state Nov 5 02:52:47 daniel-dell kernel: [62043.825848] docker_gwbridge: port 5(veth3a1c8a0) entered disabled state Nov 5 02:52:47 daniel-dell dockerd[28369]: time="2018-11-05T02:52:47-02:00" level=info msg="shim docker-containerd-shim started" address="/containerd-shim/moby/50955bcf4be2c0beccc2fc64f04efcab747b0f2c107c0b1e44ec1b26ba4262c1/shim.sock" debug=false pid=31833 Nov 5 02:52:49 daniel-dell kernel: [62046.443497] docker_gwbridge: port 5(veth3a1c8a0) entered blocking state Nov 5 02:52:49 daniel-dell kernel: [62046.443528] docker_gwbridge: port 5(veth3a1c8a0) entered forwarding state Nov 5 02:52:51 daniel-dell dockerd[28369]: time="2018-11-05T02:52:51-02:00" level=info msg="shim reaped" id=50955bcf4be2c0beccc2fc64f04efcab747b0f2c107c0b1e44ec1b26ba4262c1 Nov 5 02:52:51 daniel-dell dockerd[28369]: time="2018-11-05T02:52:51.963512686-02:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="events.TaskDelete" Nov 5 02:52:52 daniel-dell dockerd[28369]: time="2018-11-05T02:52:52.307563941-02:00" level=info msg="No non-localhost DNS nameservers are left in resolv.conf. Using default external servers: [nameserver 8.8.8.8 nameserver 8.8.4.4]" Nov 5 02:52:52 daniel-dell dockerd[28369]: time="2018-11-05T02:52:52.307679876-02:00" level=info msg="IPv6 enabled; Adding default IPv6 external servers: [nameserver 2001:4860:4860::8888 nameserver 2001:4860:4860::8844]" Nov 5 02:52:52 daniel-dell kernel: [62049.215240] docker_gwbridge: port 5(veth3a1c8a0) entered disabled state Nov 5 02:52:52 daniel-dell kernel: [62049.270165] docker_gwbridge: port 9(vethe5b0bd1) entered blocking state Nov 5 02:52:52 daniel-dell kernel: [62049.270177] docker_gwbridge: port 9(vethe5b0bd1) entered disabled state Nov 5 02:52:52 daniel-dell kernel: [62049.279975] docker_gwbridge: port 9(vethe5b0bd1) entered blocking state Nov 5 02:52:52 daniel-dell kernel: [62049.279984] docker_gwbridge: port 9(vethe5b0bd1) entered forwarding state Nov 5 02:52:52 daniel-dell kernel: [62049.348635] docker_gwbridge: port 5(veth3a1c8a0) entered disabled state Nov 5 02:52:52 daniel-dell kernel: [62049.419633] docker_gwbridge: port 5(veth3a1c8a0) entered disabled state Nov 5 02:52:53 daniel-dell NetworkManager[909]: [1541393573.0265] device (veth3a1c8a0): released from master device docker_gwbridge Nov 5 02:52:53 daniel-dell kernel: [62049.990925] docker_gwbridge: port 9(vethe5b0bd1) entered disabled state

danielprietsch commented 5 years ago

Putz... eu usei Slack, e não tinha visto aqueles comentários pra descomentar as linhas no dockerfile... seu mala! devia ter falado isso! chegou a dar badblocks no meu HD! me manda uma cerveja dessas enquanto roda o fsck... Falando sério, vou comentar as linhas do dockerfile e se continuar tomando porrada reabro aqui... forte abraço... #vaiiiiiiii

danielprietsch commented 5 years ago

mano... comentei as linhas no dockerfile... e mesmo assim... não foi... o container do alertmanager sobe e cai... de 10 em 10 segundos. meu arquivo de configuração do alertmanager está correto , comentei o rocket e preenchi cerrtinho as informações do slack.

Nov 5 03:46:45 daniel-dell dockerd[2043]: time="2018-11-05T03:46:45.140010454-02:00" level=error msg="fatal task error" error="task: non-zero exit (1)" module=node/agent/taskmanager node.id=fy2l2nh3yi0nugk5phsnnnw4a service.id=3wehiic099nk33csb4zw2oedq task.id=muoym8asdbx692x2a67nbp0r2

danielprietsch commented 5 years ago

o problema parece estar dentro do giropops-monitoring/dockerfiles/alertmanager/conf/config.yml pois só traz referencias ao rocket chat...

danielprietsch commented 5 years ago

brother... o erro é o seguinte... lá na configuração do slack... quando descomenta a linha, tá faltando um " : " depois do receivers... é ridículo mas até eu me atentar a isso demorou... abraço #vaiiii

receiver: 'slack'

#

receivers

- name: 'slack'

slack_configs:

- send_resolved: true

username: 'YOUR USERNAME'

danielprietsch commented 5 years ago

irmão,.. tudo funcionando perfeito.. vou dormir feliz e amanhã vejo os outros vídeos da série pra entender melhor toda solução... rodei o stress e mandou o alerta pro slack... pode fechar essa issue mas te aconselho a ajustar a sintaxe lá.. abraço