ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.07k stars 6.84k forks source link

Add a way to configure ClickHouse through YAML #3607

Closed Felixoid closed 3 years ago

Felixoid commented 5 years ago

As was discussed on meet-up at 2018.11.15 it would be great to manage configs with not only XML but also with YAML (and json as a proper subset of YAML)

The problem here is that no one of salt, ansible or puppet could serialize data from objects to XML and force to make strange templates like this or (*see internal saltstack jinja2 templates). And with json or YAML, there are out of the box ways to generate configs.

So, basically, for saltstack with pillar structure:

clickhouse:
  shards:
    cluster_configs:
    - cluster: cluster1
      shard_user: user1
      nodes:
        1:
          - example01
          - example02
        2:
          - example03
          - example04
        3:
          - example05
          - example06
    - cluster: cluster2
      shard_user: user2
      nodes:
        1:
          - example01
          - example02
          - example03
        2:
          - example04
          - example05
          - example06

I have to create jinja template:

<yandex>
{% if pillar['clickhouse'] is defined -%}
{% if pillar['clickhouse']['shards'] is defined -%}
    <remote_servers>
{%- for cluster_config in pillar['clickhouse']['shards']['cluster_configs'] %}
        <{{cluster_config['cluster']}}>
{%- for shard in cluster_config['nodes'] %}
            <shard>
                <internal_replication>true</internal_replication>
{%- for replica in cluster_config['nodes'][shard] %}
                <replica>
                    <host>{{ replica }}</host>
                    <port>9000</port>
                    <user>{{cluster_config['shard_user']}}</user>
                </replica>
{%- endfor %}
            </shard>
{%- endfor %}
        </{{cluster_config['cluster']}}>
{%- endfor %}
    </remote_servers>
{%- endif %}
{%- endif %}
</yandex>

And instead of it it would be pillar (first iteration):

clickhouse:
  clusters:
    yandex:
      remote_servers:
        cluster1:
          shard:
          - internal_replication: true
            replica:
            - host: example01
              port: 9000
              user: user1
            - host: example02
              port: 9000
              user: user1
          - internal_replication: true
            replica:
            - host: example03
              port: 9000
              user: user1
            - host: example04
              port: 9000
              user: user1
          - internal_replication: true
            replica:
            - host: example05
              port: 9000
              user: user1
            - host: example06
              port: 9000
              user: user1
        cluster2:
          shard:
          - internal_replication: true
            replica:
            - host: example01
              port: 9000
              user: user1
            - host: example02
              port: 9000
              user: user1
            - host: example03
              port: 9000
              user: user1
          - internal_replication: true
            replica:
            - host: example04
              port: 9000
              user: user1
            - host: example05
              port: 9000
              user: user1
            - host: example06
              port: 9000
              user: user1

and in shards.yaml just:

{% if pillar['clickhouse'] is defined -%}
{% if pillar['clickhouse']['clusters'] is defined -%}
{{ pillar['clikhouse']['clusters']|yaml }}
{%- endif %}
{%- endif %}
filimonov commented 5 years ago

YAML would be nice. But actually configuration management in ClickHouse is quite flexible and mature. It's not really obvious sometimes and sometimes you need to find proper 'ClickHouse-way' to solve the problems, but it's doable with XMLs.

ClickHouse has quite a lot of options and it's quite hard (and usually there is no need) to map all of them into some objects. It's better to have all servers in cluster pretty the same, and the only thing you need to adjust per server - is macroses, may be interserver host name.

My own summary of 'best practices' for ClickHouse configs:

  1. don't edit/overwrite default configuration files, sometimes newer version introduces some new settings or change the defaults and you can miss that if you will used fixed config.xml and users.xml.

  2. instead of that it's quite easy to do any modification of configurations via extra files in conf.d directory, for example, if you want to overwrite interface put a file conf.d/listen.xml, like that:

    <?xml version="1.0"?>
    <yandex>
    <listen_host replace="replace">::</listen_host>
    </yandex>
  3. same for users: you can for example change default profile by putting file users.d/profile_default.xml

    <?xml version="1.0"?>
    <yandex>
    <profiles>
        <default replace="replace">
            <max_memory_usage>15000000000</max_memory_usage>
            <max_bytes_before_external_group_by>12000000000</max_bytes_before_external_group_by>
            <max_bytes_before_external_sort>12000000000</max_bytes_before_external_sort>
            <distributed_aggregation_memory_efficient>1</distributed_aggregation_memory_efficient>
            <use_uncompressed_cache>0</use_uncompressed_cache>
            <load_balancing>random</load_balancing>
            <log_queries>1</log_queries>
            <max_execution_time>600</max_execution_time>
        </default>
    </profiles>
    </yandex>
  4. or you can create a user by putting a file users.d/user_xxx.xml

    <?xml version="1.0"?>
    <yandex>
    <users>
        <xxx>
            <!-- PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-' -->
            <password_sha256_hex>...</password_sha256_hex>
            <networks incl="networks" />
            <profile>readonly</profile>
            <quota>default</quota>
            <allow_databases incl="allowed_databases" />
        </xxx>
    </users>
    </yandex>
  5. some parts of configuratuion will contain repeated elements (like allowed ips for all the users). To avoid repeating that - use substitutions file. By default its /etc/metrika.xml, but you can change it for example to /etc/clickhouse-server/substitutions.xml ( section of main config). Put that repeated parts into substitutions file, like that

    <?xml version="1.0"?>
    <yandex>
    <networks>
        <ip>::1</ip>
        <ip>127.0.0.1</ip>
        <ip>10.42.0.0/16</ip>
        <ip>192.168.0.0/24</ip>
    </networks>
    
    <clickhouse_remote_servers>
    <!-- cluster definition -->
    </clickhouse_remote_servers>
    
    <zookeeper-servers>
        <node>
            <host>zookeeper1</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper2</host>
            <port>2181</port>
        </node>
        <node>
            <host>zookeeper3</host>
            <port>2181</port>
        </node>
    </zookeeper-servers>
    
    <clickhouse_compression></clickhouse_compression>
    </yandex>

    that file can be common for all the servers inside the cluster / datacenter or individual per server, that is the only part (ideally) that should be generated/controlled by puppet / chef. If you will go to use one substitutions file per cluster (not per node) you will also need to generate file with macros (if you use macroses).

In that way you have full flexibility (you're not limited to the settings described in template), you can change any settings per server / datacenter just by assigning file with some settings to that server / server group. It's really easy to navigate throuth / edit / assign those files.

Felixoid commented 5 years ago

I know how to do it properly. And did use everything which you suggested. This would be a good advise, BTW, but the problem definitely in XML by itself. As was described, you have to write ugly templates to generate them instead of just objects serializing like https://docs.saltstack.com/en/latest/ref/renderers/all/salt.renderers.jinja.html or https://gist.github.com/WhatsARanjit/12bdcce78ef07641a2a7c6d37d4b369d

filimonov commented 5 years ago

I know how to do it properly. And did use everything which you suggested. This would be a good advise, BTW, but the problem definitely in XML by itself.

I wasn't aware of your background, quite often newbies want to change something just because they don't know yet how to use the stuff which already exists. So @Felixoid, sorry and you can read only the first sentence of my previous answer :)

filimonov commented 5 years ago

https://github.com/pocoproject/poco/issues/1308

sergiustheblack commented 3 years ago

We do such thing with saltstack formula. check out. Definitely not what you are asking for, but... It just generates xml config from yaml pillar.

Felixoid commented 3 years ago

Thank you, but it looks still more or less the same to me.

We use puppet though, and it looks more native. Here's an example https://github.com/innogames/puppet-clickhouse/blob/master/manifests/server/config.pp#L18

BoloniniD commented 3 years ago

At this moment I've already written the tests for YAML configs and fixed most bugs in the code. I think, after solving all problems found during last code review, new feature will be ready to use.

alexey-milovidov commented 3 years ago

This is going to be released in 21.7.

alexey-milovidov commented 3 years ago

Examples here: https://github.com/ClickHouse/ClickHouse/pull/21858/files#diff-faa8d785fa54f3dab72b68b8d000c3af5c9800adbee9ebad48a270b0c14dd8b8R1

alexey-milovidov commented 3 years ago

TIL: YAML does not support multiline comments :cry: Neither do TOML. Of course JSON does not have comments at all.

In YAML it looks almost Ok but not very clean: https://github.com/ClickHouse/ClickHouse/pull/24409/files#diff-c7da515f549bcf16dcf29e4f14d7a610e5640071a43c9beb217664dbe75cc52aR302

Looks like every config language is fundamentally worse than XML :rofl: