flant / loghouse

Ready to use log management solution for Kubernetes storing data in ClickHouse and providing web UI.
Apache License 2.0
925 stars 76 forks source link

Clickhouse server crash #13

Open valentin2105 opened 6 years ago

valentin2105 commented 6 years ago

Hi,

I try to deploy Clickhouse in my Kubernetes cluster but Clickhouse pod crash at startup.

Here is pod's logs :

╰─# k -n loghouse logs -f clickhouse-server-79dc494dfc-sz268                                                                                                                                          
Include not found: clickhouse_remote_servers
Include not found: clickhouse_compression
2017.11.02 03:01:08.091964 [ 1 ] <Warning> Application: Logging to console
2017.11.02 03:01:08.103679 [ 1 ] <Warning> ConfigProcessor: Include not found: networks
2017.11.02 03:01:10.103799 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_remote_servers
2017.11.02 03:01:10.103848 [ 2 ] <Warning> ConfigProcessor: Include not found: clickhouse_compression
2017.11.02 03:01:10.107723 [ 3 ] <Warning> ConfigProcessor: Include not found: networks

I used the Helm chart :

name: loghouse
version: 0.0.1

And don't made any changes in templates/clickhouse/clickhouse-configmap.yml

Kubernetes v1.8.2 Calico CNI IPv6 only (I made the changes to make it work about Service ClusterIP) Debian 9.2

qw1mb0 commented 6 years ago

Thank you for your feedback!

However, these logs have no information about ClickHouse's crash. Can you please provide "describe" for your pod? If PVC has been also used, please provide describe PVC as well. It will help us to investigate this issue.

valentin2105 commented 6 years ago

So,

After small investigations,

The container crash due of Liveness probe that doesn't pass :

Started container
pulling image "flant/loghouse-clickhouse:latest"
Liveness probe failed: dial tcp [1404:f210:d:83::8404:d35f]:9000: getsockopt: connection refused

So I have made this change : <listen_host>::</listen_host>

Now, the clickhouse server seem running fine, but now the backend pod crash due of failed to load command: puma.

Puma starting in single mode...
* Version 3.10.0 (ruby 2.3.4-p301), codename: Russell's Teapot
* Min threads: 0, max threads: 16
* Environment: production
* Listening on tcp://0.0.0.0:9292
Use Ctrl-C to stop
- Gracefully stopping, waiting for requests to finish
=== puma shutdown: 2017-11-02 23:38:54 +0000 ===
- Goodbye!
bundler: failed to load command: puma (/usr/local/bundle/bin/puma)
SignalException: SIGTERM
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/launcher.rb:397:in `block in setup_signals'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/single.rb:106:in `join'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/single.rb:106:in `run'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/launcher.rb:183:in `run'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/cli.rb:77:in `run'
  /usr/local/bundle/gems/puma-3.10.0/bin/puma:10:in `<top (required)>'
  /usr/local/bundle/bin/puma:17:in `load'
  /usr/local/bundle/bin/puma:17:in `<top (required)>'
qw1mb0 commented 6 years ago

Sorry for the long wait, as can be seen from the errors, the backend came SIGTERM, which is sent when livenessprobe makes a mistake.

Can you provide a describe pod?

diafour commented 6 years ago

Reproduce this behavior on local cluster with loghouse-dashboard image built from master branch :(

Puma starting in single mode...
* Version 3.10.0 (ruby 2.3.4-p301), codename: Russell's Teapot
* Min threads: 0, max threads: 16
* Environment: production
! Unable to load application: Clickhouse::QueryError: Got status 404 (expected 200): Code: 81, e.displayText() = DB::Exception: Database logs doesn't exist, e.what() = DB::Exception
bundler: failed to load command: puma (/usr/local/bundle/bin/puma)
Clickhouse::QueryError: Got status 404 (expected 200): Code: 81, e.displayText() = DB::Exception: Database logs doesn't exist, e.what() = DB::Exception

  /usr/local/bundle/bundler/gems/clickhouse-6b00d0459d06/lib/clickhouse/connection/client.rb:70:in `request'
  /usr/local/bundle/bundler/gems/clickhouse-6b00d0459d06/lib/clickhouse/connection/client.rb:30:in `post'
  /usr/local/bundle/bundler/gems/clickhouse-6b00d0459d06/lib/clickhouse/connection/query.rb:10:in `execute'
  /usr/local/bundle/bundler/gems/clickhouse-6b00d0459d06/lib/clickhouse/connection/query.rb:48:in `exists_table'
  /app/lib/loghouse_query/table.rb:46:in `table_exists?'
  /app/lib/loghouse_query/table.rb:50:in `create_table_with_migration!'
  /app/config/clickhouse.rb:9:in `<top (required)>'
  /app/config/boot.rb:21:in `require_relative'
  /app/config/boot.rb:21:in `<top (required)>'
  /app/application.rb:1:in `require_relative'
  /app/application.rb:1:in `<top (required)>'
  config.ru:1:in `require_relative'
  config.ru:1:in `block in <main>'
  /usr/local/bundle/gems/rack-2.0.4/lib/rack/builder.rb:55:in `instance_eval'
  /usr/local/bundle/gems/rack-2.0.4/lib/rack/builder.rb:55:in `initialize'
  config.ru:in `new'
  config.ru:in `<main>'
  /usr/local/bundle/gems/rack-2.0.4/lib/rack/builder.rb:49:in `eval'
  /usr/local/bundle/gems/rack-2.0.4/lib/rack/builder.rb:49:in `new_from_string'
  /usr/local/bundle/gems/rack-2.0.4/lib/rack/builder.rb:40:in `parse_file'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/configuration.rb:314:in `load_rackup'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/configuration.rb:243:in `app'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/runner.rb:138:in `load_and_bind'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/single.rb:87:in `run'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/launcher.rb:183:in `run'
  /usr/local/bundle/gems/puma-3.10.0/lib/puma/cli.rb:77:in `run'
  /usr/local/bundle/gems/puma-3.10.0/bin/puma:10:in `<top (required)>'
  /usr/local/bundle/bin/puma:17:in `load'
  /usr/local/bundle/bin/puma:17:in `<top (required)>'
diafour commented 6 years ago

Oops. That is a different story. clickhouse pod started slowly and loghouse-init-db job failed. kubectl replace --force thi job, delete loghouse pod and loghouse has RUNNING state.