Altinity / clickhouse-backup

Tool for easy backup and restore for ClickHouse® using object storage for backup files.
https://altinity.com
Other
1.29k stars 226 forks source link

Need better detection when clickhouse-backup doesn't have the same disk with clickhouse-server #1037

Open gabrielheck opened 3 weeks ago

gabrielheck commented 3 weeks ago

Hello everyone,

I'm facing issues with restoring backups using Altinity/clickhouse-backup. The first issue involves a full backup from May/24. This backup contains both the shadow and metadata directories with data, but when I run the restore command, the following warning appears in the logs:

`\warn 'shard{shard}-full-20240425171537' doesn't contain tables for restore backup=shard{shard}-full-20240425171537 operation=restore'

I noticed that this backup lacks a metadata.json file in the root directory, which differentiates it from recent backups. Could this missing file be causing the restore issue?

For a more recent backup, listed as 700 GB in size, the restore operation only retrieves metadata, excluding the actual data. Here’s the sequence of commands executed, the output from the download command and also the config.yml custom config.

./clickhouse-backup -c ./config.yml list
./clickhouse-backup -c ./config.yml download shard{shard}-full-20241030033305
./clickhouse-backup -c ./config.yml restore --tables=my_database.* shard{shard}-full-20241030033305

15:31:17.918391 info done backup=shard{shard}-full-20241030033305 duration=612ms logger=backuper operation=download size=737.47GiB

config.yml configuration:

general:
  remote_storage: azblob

clickhouse:
  username: ****
  password: ****
  host: clickhouse
  port: 9000

azblob:
  account_name: ****
  account_key: ****
  container: ****

Any ideas on why the restoration for both backups is incomplete—one failing to locate tables and the other only retrieving metadata? Could this be related to configuration, storage, or version differences in Altinity/clickhouse-backup?

I'm using clickhouse-backup version 2.6.2 with ClickHouse version 24.9.2.42.

Slach commented 3 weeks ago
./clickhouse-backup -c ./config.yml restore --tables=my_database.* shard{shard}-full-20241030033305

share ls -la /var/lib/clickhouse/backup/shard{shard}-full-20241030033305/metadata/my_database/ ?

Slach commented 3 weeks ago

warn 'shard{shard}-full-20240425171537' doesn't contain tables for restore backup=shard{shard}-full-20240425171537 operation=restore'

could you also check ? ls -la /var/lib/clickhouse/backup/shard{shard}-full-20240425171537/metadata/my_database/

gabrielheck commented 3 weeks ago

warn 'shard{shard}-full-20240425171537' doesn't contain tables for restore backup=shard{shard}-full-20240425171537 operation=restore'

could you also check ? ls -la /var/lib/clickhouse/backup/shard{shard}-full-20240425171537/metadata/my_database/

@Slach, for the second command, there's only two files at /var/lib/clickhouse/backup/shard{shard}-full-20240425171537 Those are:

download.state metadata.json

But at the remote backup I can see both the shadow and metadata folders.

gabrielheck commented 3 weeks ago
./clickhouse-backup -c ./config.yml restore --tables=my_database.* shard{shard}-full-20241030033305

share ls -la /var/lib/clickhouse/backup/shard{shard}-full-20241030033305/metadata/my_database/ ?

Hi @Slach

The output for the first command is:

ls -la /var/lib/clickhouse/backup/shard{shard}-full-20241030033305/metadata/vector_storage

total 108 drwxr-xr-x 2 root root 4096 Oct 31 15:09 . drwxr-xr-x 26 root root 4096 Oct 31 15:09 .. -rw-r----- 1 root root 537 Oct 31 15:09 articles.json -rw-r----- 1 root root 555 Oct 31 15:09 available_wallets.json -rw-r----- 1 root root 505 Oct 31 15:09 wallets.json -rw-r----- 1 root root 823 Oct 31 15:09 backup_contents.json -rw-r----- 1 root root 468 Oct 31 15:09 contents_chunks_900t_groups3.json -rw-r----- 1 root root 468 Oct 31 15:09 contents_chunks_900t_groups5.json -rw-r----- 1 root root 1137 Oct 31 15:09 contents_chunks_900t.json -rw-r----- 1 root root 592 Oct 31 15:09 contents_chunks_s1.json -rw-r----- 1 root root 592 Oct 31 15:09 contents_chunks_s2.json -rw-r----- 1 root root 591 Oct 31 15:09 contents_chunks_s3.json -rw-r----- 1 root root 1074 Oct 31 15:09 contents_chunks_t1.json -rw-r----- 1 root root 499 Oct 31 15:09 ignored_contents.json -rw-r----- 1 root root 776 Oct 31 15:09 contents.json -rw-r----- 1 root root 471 Oct 31 15:09 contents_relevance.json -rw-r----- 1 root root 433 Oct 31 15:09 contents_summary_v1.json -rw-r----- 1 root root 587 Oct 31 15:09 public_offer_document_search.json -rw-r----- 1 root root 443 Oct 31 15:09 document_search_chunks.json -rw-r----- 1 root root 405 Oct 31 15:09 document_search.json -rw-r----- 1 root root 387 Oct 31 15:09 milvus_ids_indexed.json -rw-r----- 1 root root 954 Oct 31 15:09 _old_20240819_2_contents.json -rw-r----- 1 root root 653 Oct 31 15:09 sentiment_contents.json -rw-r----- 1 root root 441 Oct 31 15:09 processed_sentiment_contents.json -rw-r----- 1 root root 665 Oct 31 15:09 tags_tickers_contents.json -rw-r----- 1 root root 473 Oct 31 15:09 processed_tags_tickers_contents.json -rw-r----- 1 root root 457 Oct 31 15:09 top_news.json

gabrielheck commented 3 weeks ago

@Slach

We have detected that our most recent backup, despite the list indicating a download size of 737.47 GiB, contains only the metadata folder, with no shadow folder present. Is there any known reason for this issue?

Thank you in advance for your assistance.

Slach commented 3 weeks ago

the restore operation only retrieves metadata

could you share

LOG_LEVEL=debug ./clickhouse-backup -c ./config.yml restore --tables=vector_storage.* shard{shard}-full-20241030033305

Slach commented 3 weeks ago

'shard{shard}-full-20240425171537' doesn't contain tables for restore backup=shard{shard}-full-20240425171537 operation=restore'

could you share

./clickhouse-backup -c ./config.yml tables --remote-backup shard{shard}-full-20240425171537
./clickhouse-backup -c ./config.yml list remote
gabrielheck commented 3 weeks ago

@Slach

could you share

LOG_LEVEL=debug ./clickhouse-backup -c ./config.yml restore --tables=vector_storage.* shard{shard}-full-20241030033305

LOG_LEVEL=debug ./clickhouse-backup -c ./config.yml restore --tables=vector_storage.* shard{shard}-full-20241030033305
2024/11/01 08:46:20.633499  info clickhouse connection prepared: tcp://clickhouse:9000 run ping logger=clickhouse
2024/11/01 08:46:20.636689  info clickhouse connection success: tcp://clickhouse:9000 logger=clickhouse
2024/11/01 08:46:20.636712  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/11/01 08:46:20.642011  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/11/01 08:46:20.651293  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/11/01 08:46:20.661668  info CREATE DATABASE IF NOT EXISTS `vector_storage` ENGINE = Atomic with args [[]] logger=clickhouse
2024/11/01 08:46:20.677118  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.680897  info DROP TABLE IF EXISTS `vector_storage`.`_old_20240819_2_contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.684114  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.687543  info DROP TABLE IF EXISTS `vector_storage`.`articles` NO DELAY logger=clickhouse
2024/11/01 08:46:20.690446  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.693976  info DROP TABLE IF EXISTS `vector_storage`.`available_wallets` NO DELAY logger=clickhouse
2024/11/01 08:46:20.696782  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.700144  info DROP TABLE IF EXISTS `vector_storage`.`wallets` NO DELAY logger=clickhouse
2024/11/01 08:46:20.702946  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.706404  info DROP TABLE IF EXISTS `vector_storage`.`contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.709157  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.712627  info DROP TABLE IF EXISTS `vector_storage`.`contents_backup` NO DELAY logger=clickhouse
2024/11/01 08:46:20.715420  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.719217  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_900t` NO DELAY logger=clickhouse
2024/11/01 08:46:20.721918  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.725329  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_900t_groups3` NO DELAY logger=clickhouse
2024/11/01 08:46:20.728040  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.731357  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_900t_groups5` NO DELAY logger=clickhouse
2024/11/01 08:46:20.734241  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.737755  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_s1` NO DELAY logger=clickhouse
2024/11/01 08:46:20.740762  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.743954  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_s2` NO DELAY logger=clickhouse
2024/11/01 08:46:20.746620  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.750031  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_s3` NO DELAY logger=clickhouse
2024/11/01 08:46:20.752728  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.756009  info DROP TABLE IF EXISTS `vector_storage`.`contents_chunks_t1` NO DELAY logger=clickhouse
2024/11/01 08:46:20.758671  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.762194  info DROP TABLE IF EXISTS `vector_storage`.`ignored_contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.765018  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.768976  info DROP TABLE IF EXISTS `vector_storage`.`contents_relevance` NO DELAY logger=clickhouse
2024/11/01 08:46:20.771621  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.774981  info DROP TABLE IF EXISTS `vector_storage`.`contents_summary_v1` NO DELAY logger=clickhouse
2024/11/01 08:46:20.777785  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.781063  info DROP TABLE IF EXISTS `vector_storage`.`public_offer_document_search` NO DELAY logger=clickhouse
2024/11/01 08:46:20.783725  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.787101  info DROP TABLE IF EXISTS `vector_storage`.`documents_search` NO DELAY logger=clickhouse
2024/11/01 08:46:20.789760  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.793258  info DROP TABLE IF EXISTS `vector_storage`.`documents_search_chunks` NO DELAY logger=clickhouse
2024/11/01 08:46:20.796086  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.799214  info DROP TABLE IF EXISTS `vector_storage`.`milvus_ids_indexed` NO DELAY logger=clickhouse
2024/11/01 08:46:20.801975  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.805270  info DROP TABLE IF EXISTS `vector_storage`.`sentiment_contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.808166  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.811658  info DROP TABLE IF EXISTS `vector_storage`.`processed_sentiment_contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.814738  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.818042  info DROP TABLE IF EXISTS `vector_storage`.`tags_tickers_contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.821279  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.824848  info DROP TABLE IF EXISTS `vector_storage`.`processed_tags_tickers_contents` NO DELAY logger=clickhouse
2024/11/01 08:46:20.827644  info SELECT engine FROM system.databases WHERE name = 'vector_storage' logger=clickhouse
2024/11/01 08:46:20.831071  info DROP TABLE IF EXISTS `vector_storage`.`top_news` NO DELAY logger=clickhouse
2024/11/01 08:46:20.833763  info CREATE DATABASE IF NOT EXISTS `vector_storage` logger=clickhouse
2024/11/01 08:46:20.835249  info CREATE TABLE vector_storage._old_20240819_2_contents UUID '4fc61404-116f-4ecc-bec7-8debb9445019' (`uuid_conteudo` UUID DEFAULT generateUUIDv4(), `origem` String, `tipo` String, `url` String, `html` String, `data` DateTime, `titulo` String DEFAULT '', `subtitulo` String DEFAULT '', `texto` String, `file_paths` Array(String) DEFAULT [], `conteudo_pdfs` Array(String) DEFAULT [], INDEX inv_idx texto TYPE inverted GRANULARITY 1, INDEX texto_lower_idx texto TYPE inverted GRANULARITY 1, INDEX contents_data_index data TYPE minmax GRANULARITY 1, INDEX contents_uuid_conteudo_ix uuid_conteudo TYPE minmax GRANULARITY 1) ENGINE = ReplacingMergeTree PRIMARY KEY (tipo, origem, toStartOfDay(data)) ORDER BY (tipo, origem, toStartOfDay(data), uuid_conteudo) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.853290  info CREATE TABLE vector_storage.articles UUID '68fa2670-22d2-4c3f-84f0-ef8b20aa865a' (`uuid_artigo` UUID DEFAULT generateUUIDv4(), `chunked` Bool DEFAULT 0, `url` String, `html` String, `data` DateTime, `titulo` String DEFAULT '', `subtitulo` String DEFAULT '', `texto` String) ENGINE = MergeTree PRIMARY KEY tuple(uuid_artigo) ORDER BY (uuid_artigo, data) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.860645  info CREATE TABLE vector_storage.available_wallets UUID 'd75dbde2-153d-493c-a541-22b4a8a5c603' (`uuid_carteira_disponivel` UUID DEFAULT generateUUIDv4(), `corretora` String, `nome_carteira` String, `tipo_carteira` String, `titulo` Nullable(String)) ENGINE = ReplacingMergeTree PRIMARY KEY (corretora, tipo_carteira) ORDER BY (corretora, tipo_carteira, nome_carteira) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.867594  info CREATE TABLE vector_storage.wallets UUID 'b43d8242-b37c-45eb-b63f-c86337c6b433' (`uuid_carteira` UUID DEFAULT generateUUIDv4(), `data` Date, `nome_carteira` String, `uuid_conteudo` UUID, `json_carteira` String) ENGINE = ReplacingMergeTree PRIMARY KEY (nome_carteira, data) ORDER BY (nome_carteira, data, uuid_conteudo) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.874534  info CREATE TABLE vector_storage.contents UUID '28dccac9-2765-40fe-8282-361132f9eb50' (`uuid_conteudo` UUID DEFAULT generateUUIDv4(), `origem` String, `tipo` String, `url` String, `html` String, `data` DateTime, `titulo` String DEFAULT '', `subtitulo` String DEFAULT '', `texto` String, `file_paths` Array(String) DEFAULT [], `conteudo_pdfs` Array(String) DEFAULT [], INDEX contents_data_index data TYPE minmax GRANULARITY 1, INDEX idx_uuid_conteudo_bloom uuid_conteudo TYPE bloom_filter(0.01) GRANULARITY 1) ENGINE = ReplacingMergeTree PARTITION BY toYYYYMM(data) PRIMARY KEY url ORDER BY url SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.882731  info CREATE TABLE vector_storage.contents_backup UUID '0a58ba5e-d58b-4755-959c-90ab2e732481' (`uuid_conteudo` UUID DEFAULT generateUUIDv4(), `origem` String, `tipo` String, `url` String, `html` String, `data` DateTime, `titulo` String DEFAULT '', `subtitulo` String DEFAULT '', `texto` String, `file_paths` Array(String) DEFAULT [], `conteudo_pdfs` Array(String) DEFAULT [], INDEX contents_data_index data TYPE minmax GRANULARITY 1, INDEX idx_uuid_conteudo_bloom uuid_conteudo TYPE bloom_filter(0.01) GRANULARITY 1) ENGINE = ReplacingMergeTree PARTITION BY toYYYYMM(data) PRIMARY KEY url ORDER BY (url, toYYYYMM(data), uuid_conteudo) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.891106  info CREATE TABLE vector_storage.contents_chunks_900t UUID 'fd9fb19e-8185-4f88-8b26-37f298398628' (`uuid` UUID DEFAULT generateUUIDv4(), `id` Int64, `id_anterior` Nullable(Int64), `id_posterior` Nullable(Int64), `document` String DEFAULT '', `embedding` Array(Float32), `tipo` String DEFAULT '', `origem` String DEFAULT '', `data` DateTime, `url` String DEFAULT '', `titulo` String DEFAULT '', `subtitulo` String DEFAULT '', `headings` String DEFAULT '', `metadata` Object('json'), `uuid_conteudo` UUID, INDEX data_index data TYPE minmax GRANULARITY 4, INDEX idx_uuid_conteudo_chunk_900t uuid TYPE bloom_filter GRANULARITY 4, INDEX ix_conteudo_chunks_900t_id id TYPE set(0) GRANULARITY 1, INDEX idx_uuid_conteudo_set uuid_conteudo TYPE set(100) GRANULARITY 1, CONSTRAINT cons_vec_len CHECK length(embedding) = 1536) ENGINE = MergeTree PRIMARY KEY (tipo, origem, data, titulo, url, document) ORDER BY (tipo, origem, data, titulo, url, document) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.901252  info CREATE TABLE vector_storage.contents_chunks_900t_groups3 UUID '5b434f39-ed27-4dc2-a14d-91d7cf616430' (`uuid` UUID DEFAULT generateUUIDv4(), `id_reference_chunk` Int64, `id_chunk` Int64, `chunk_order` Int32) ENGINE = MergeTree PRIMARY KEY id_chunk ORDER BY id_chunk SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.907967  info CREATE TABLE vector_storage.contents_chunks_900t_groups5 UUID '1790f8d3-f5d1-4120-83fe-63cbb9e6bfec' (`uuid` UUID DEFAULT generateUUIDv4(), `id_reference_chunk` Int64, `id_chunk` Int64, `chunk_order` Int32) ENGINE = MergeTree PRIMARY KEY id_chunk ORDER BY id_chunk SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.915065  info CREATE TABLE vector_storage.contents_chunks_s1 UUID 'bbd25078-03de-4e41-9d98-52c73d30a7fa' (`id` Nullable(String), `document` Nullable(String), `embedding` Array(Float32), `metadata` Object('json'), `uuid` UUID DEFAULT generateUUIDv4(), INDEX vec_idx embedding TYPE annoy('L2Distance', 100) GRANULARITY 1000, CONSTRAINT cons_vec_len CHECK length(embedding) = 1536) ENGINE = MergeTree ORDER BY uuid SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.923350  info CREATE TABLE vector_storage.contents_chunks_s2 UUID '937214cc-1a31-4da4-b69d-c7e7bbcf3459' (`id` Nullable(String), `document` Nullable(String), `embedding` Array(Float32), `metadata` Object('json'), `uuid` UUID DEFAULT generateUUIDv4(), INDEX vec_idx embedding TYPE annoy('L2Distance', 100) GRANULARITY 1000, CONSTRAINT cons_vec_len CHECK length(embedding) = 1536) ENGINE = MergeTree ORDER BY uuid SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.931247  info CREATE TABLE vector_storage.contents_chunks_s3 UUID '9f07358f-81af-4682-8da1-3f4662a00831' (`id` Nullable(String), `document` Nullable(String), `embedding` Array(Float32), `metadata` Object('json'), `uuid` UUID DEFAULT generateUUIDv4(), INDEX vec_idx embedding TYPE annoy('L2Distance', 100) GRANULARITY 1000, CONSTRAINT cons_vec_len CHECK length(embedding) = 1536) ENGINE = MergeTree ORDER BY uuid SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.939293  info CREATE TABLE vector_storage.contents_chunks_t1 UUID '3aca0b21-46c1-4013-bad3-a18ab1406bd9' (`id` Nullable(String), `uuid` UUID DEFAULT generateUUIDv4(), `document` String DEFAULT '', `embedding` Array(Float32), `tipo` String DEFAULT '', `origem` String DEFAULT '', `data` DateTime, `url` String DEFAULT '', `titulo` String DEFAULT '', `subtitulo` String DEFAULT '', `headings` String DEFAULT '', `metadata` Object('json'), `uuid_conteudo` UUID, INDEX vec_idx embedding TYPE annoy('L2Distance', 100) GRANULARITY 1000, INDEX data_index data TYPE minmax GRANULARITY 4, INDEX idx_uuid_conteudo_chunk uuid TYPE bloom_filter GRANULARITY 4, INDEX ix_conteudo_chunks_t1_id id TYPE set(0) GRANULARITY 1, CONSTRAINT cons_vec_len CHECK length(embedding) = 1536) ENGINE = MergeTree PRIMARY KEY (tipo, origem, data, titulo, url, document) ORDER BY (tipo, origem, data, titulo, url, document) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.950312  info CREATE TABLE vector_storage.ignored_contents UUID '54d2bf83-fd14-4a83-918f-4f49bb49e550' (`uuid_conteudo_ignorado` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `url` String, `data_inclusao` DateTime) ENGINE = MergeTree PRIMARY KEY (uuid_conteudo, url) ORDER BY (uuid_conteudo, url, data_inclusao) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.959078  info CREATE TABLE vector_storage.contents_relevance UUID '5935a600-8d68-4d42-8ec3-00952c850131' (`uuid` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `relevancia` Int32, `inedita` Int32, `nota_final` Int32) ENGINE = MergeTree PRIMARY KEY uuid_conteudo ORDER BY uuid_conteudo SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.966870  info CREATE TABLE vector_storage.contents_summary_v1 UUID 'dbee0efa-bfa4-4467-95b9-0f4e2cea3a7e' (`uuid` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `summary` String) ENGINE = MergeTree PRIMARY KEY uuid_conteudo ORDER BY uuid_conteudo SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.973669  info CREATE TABLE vector_storage.public_offer_document_search UUID '18546638-0ecf-4fd3-936d-ff4dea2da0a1' (`numero_requerimento_oferta_publica` String, `id_documento_oferta_publica` String, `id_documento_search` Int64) ENGINE = MergeTree PRIMARY KEY (numero_requerimento_oferta_publica, id_documento_oferta_publica) ORDER BY (numero_requerimento_oferta_publica, id_documento_oferta_publica) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.981089  info CREATE TABLE vector_storage.documents_search UUID 'c4a16e6c-3ef6-40d5-8461-077d1a9443b1' (`id` Int64 DEFAULT abs(cityHash64(randomPrintableASCII(10))), `conteudo` String) ENGINE = MergeTree PRIMARY KEY id ORDER BY id SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.988036  info CREATE TABLE vector_storage.documents_search_chunks UUID 'e8d0c92c-268d-4cda-a410-48a71aa4df5c' (`id_documento` Int64, `id_chunk` Int64, `chunk_text` String) ENGINE = MergeTree PRIMARY KEY (id_documento, id_chunk) ORDER BY (id_documento, id_chunk) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:20.995294  info CREATE TABLE vector_storage.milvus_ids_indexed UUID '7cc775f1-adcc-4e0b-88d4-03e0b3e61745' (`uuid` UUID DEFAULT generateUUIDv4(), `clickhouse_id` Nullable(String)) ENGINE = MergeTree ORDER BY uuid SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:21.001948  info CREATE TABLE vector_storage.sentiment_contents UUID '8c11578d-4687-4e2c-9ce7-f01387e4a3ae' (`uuid` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `ticker` String, `sentimento_1_dia` Int32, `sentimento_1_semana` Int32, `sentimento_1_mes` Int32, `rationale` String, INDEX idx_ticker_upper upper(ticker) TYPE minmax GRANULARITY 1, INDEX idx_uuid_conteudo uuid_conteudo TYPE minmax GRANULARITY 1) ENGINE = MergeTree PRIMARY KEY (ticker, uuid_conteudo) ORDER BY (ticker, uuid_conteudo) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:21.010244  info CREATE TABLE vector_storage.processed_sentiment_contents UUID 'e3c61fee-679b-46ec-8cd1-a72e71350144' (`uuid` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `data_processamento` DateTime) ENGINE = MergeTree PRIMARY KEY uuid_conteudo ORDER BY uuid_conteudo SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:21.019830  info CREATE TABLE vector_storage.tags_tickers_contents UUID '57040196-c678-4bcc-8093-b9032dad2949' (`uuid` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `ticker` String, `referencia` String, INDEX idx_ticker_upper upper(ticker) TYPE minmax GRANULARITY 1, INDEX idx_referencia referencia TYPE set(100) GRANULARITY 1, INDEX idx_uuid_conteudo uuid_conteudo TYPE minmax GRANULARITY 1) ENGINE = MergeTree PRIMARY KEY (ticker, uuid_conteudo) ORDER BY (ticker, uuid_conteudo) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:21.028270  info CREATE TABLE vector_storage.processed_tags_tickers_contents UUID '30cd4fb6-5d9a-4121-9f90-5d92d265a5ce' (`uuid` UUID DEFAULT generateUUIDv4(), `uuid_conteudo` UUID, `data_processamento` DateTime) ENGINE = MergeTree PRIMARY KEY uuid_conteudo ORDER BY uuid_conteudo SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:21.034989  info CREATE TABLE vector_storage.top_news UUID '7b7e3783-c705-4733-b388-68de0c324381' (`uuid_topnews` UUID DEFAULT generateUUIDv4(), `hours` Int32, `url` String, `title` String, `text` String, `data` DateTime) ENGINE = ReplacingMergeTree PRIMARY KEY (hours, url) ORDER BY (hours, url) SETTINGS index_granularity = 8192 logger=clickhouse
2024/11/01 08:46:21.042639  info done                      backup=shard{shard}-full-20241030033305 duration=366ms operation=restore_schema
2024/11/01 08:46:21.059274 debug found 25 tables with data in backup backup=shard{shard}-full-20241030033305 operation=restore_data
2024/11/01 08:46:21.059332  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/11/01 08:46:21.067390  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/11/01 08:46:21.070736  info SELECT name, count(*) as is_present FROM system.settings WHERE name IN (?, ?) GROUP BY name with args [display_secrets_in_show_and_select show_table_uuid_in_table_create_query_if_not_nil] logger=clickhouse
2024/11/01 08:46:21.080927  info SELECT name FROM system.databases WHERE engine IN ('MySQL','PostgreSQL','MaterializedPostgreSQL') logger=clickhouse
2024/11/01 08:46:21.085178  info    SELECT     countIf(name='data_path') is_data_path_present,     countIf(name='data_paths') is_data_paths_present,     countIf(name='uuid') is_uuid_present,     countIf(name='create_table_query') is_create_table_query_present,     countIf(name='total_bytes') is_total_bytes_present    FROM system.columns WHERE database='system' AND table='tables'   logger=clickhouse
2024/11/01 08:46:21.097302  info SELECT database, name, engine , data_paths , uuid , create_table_query , coalesce(total_bytes, 0) AS total_bytes   FROM system.tables WHERE is_temporary = 0 AND match(concat(database,'.',name),'^vector_storage\..*$')  ORDER BY total_bytes DESC SETTINGS show_table_uuid_in_table_create_query_if_not_nil=1 logger=clickhouse
2024/11/01 08:46:21.343480  info SELECT metadata_path FROM system.tables WHERE database = 'system' AND metadata_path!='' LIMIT 1; logger=clickhouse
2024/11/01 08:46:21.348970  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='_old_20240819_2_contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.356200  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='milvus_ids_indexed' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.363799  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='available_wallets' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.371295  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='wallets' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.378640  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.386086  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_backup' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.393424  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_900t' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.401358  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='articles' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.408931  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_900t_groups5' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.416589  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_s1' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.424303  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_s2' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.431906  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_s3' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.439340  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_t1' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.447391  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='ignored_contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.455051  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_relevance' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.463376  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_summary_v1' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.470665  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='public_offer_document_search' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.478184  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='documents_search' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.485246  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='documents_search_chunks' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.492529  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='top_news' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.498363  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='sentiment_contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.505343  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='processed_sentiment_contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.511193  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='tags_tickers_contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.518381  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='processed_tags_tickers_contents' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.524134  info SELECT sum(bytes_on_disk) as size FROM system.parts WHERE active AND database='vector_storage' AND table='contents_chunks_900t_groups3' GROUP BY database, table logger=clickhouse
2024/11/01 08:46:21.531030 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531076 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.wallets
2024/11/01 08:46:21.531096 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531149 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.top_news
2024/11/01 08:46:21.531202 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531275 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_s2
2024/11/01 08:46:21.531312 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531341 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531371 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531434 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531491 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531528 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_backup
2024/11/01 08:46:21.531581  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.contents_backup
2024/11/01 08:46:21.531239 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531654 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage._old_20240819_2_contents
2024/11/01 08:46:21.531735  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage._old_20240819_2_contents
2024/11/01 08:46:21.531505 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents
2024/11/01 08:46:21.531847  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.contents
2024/11/01 08:46:21.531436 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_900t
2024/11/01 08:46:21.531921  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.contents_chunks_900t
2024/11/01 08:46:21.531345 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.processed_sentiment_contents
2024/11/01 08:46:21.532015  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.processed_sentiment_contents
2024/11/01 08:46:21.531278 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.532074 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_s1
2024/11/01 08:46:21.532115  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.contents_chunks_s1
2024/11/01 08:46:21.531320 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.532154 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.available_wallets
2024/11/01 08:46:21.532198  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.available_wallets
2024/11/01 08:46:21.531172 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.532263 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.articles
2024/11/01 08:46:21.531250 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.532340 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_900t_groups5
2024/11/01 08:46:21.531254  info done                      backup=shard{shard}-full-20241030033305 duration=0s operation=restore_data table=vector_storage.top_news
2024/11/01 08:46:21.531158  info done                      backup=shard{shard}-full-20241030033305 duration=0s operation=restore_data table=vector_storage.wallets
2024/11/01 08:46:21.531301 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.532499 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.documents_search
2024/11/01 08:46:21.531302 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531333  info done                      backup=shard{shard}-full-20241030033305 duration=0s operation=restore_data table=vector_storage.contents_chunks_s2
2024/11/01 08:46:21.531350 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531360 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531378 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_relevance
2024/11/01 08:46:21.531382 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531388 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531391 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531455 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531458 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531510 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531512 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.531543 debug done                      duration=0s operation=HardlinkBackupPartsToStorage
2024/11/01 08:46:21.532797 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.public_offer_document_search
2024/11/01 08:46:21.532889  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.public_offer_document_search
2024/11/01 08:46:21.532432  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.contents_chunks_900t_groups5
2024/11/01 08:46:21.532560 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.documents_search_chunks
2024/11/01 08:46:21.533079  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.documents_search_chunks
2024/11/01 08:46:21.532586  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.documents_search
2024/11/01 08:46:21.532600 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.tags_tickers_contents
2024/11/01 08:46:21.533269  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.tags_tickers_contents
2024/11/01 08:46:21.532634 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_s3
2024/11/01 08:46:21.533353  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.contents_chunks_s3
2024/11/01 08:46:21.532676  info done                      backup=shard{shard}-full-20241030033305 duration=2ms operation=restore_data table=vector_storage.contents_relevance
2024/11/01 08:46:21.532676 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_900t_groups3
2024/11/01 08:46:21.533460  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.contents_chunks_900t_groups3
2024/11/01 08:46:21.532688 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_chunks_t1
2024/11/01 08:46:21.533563  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.contents_chunks_t1
2024/11/01 08:46:21.532748 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.contents_summary_v1
2024/11/01 08:46:21.533680  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.contents_summary_v1
2024/11/01 08:46:21.532754 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.processed_tags_tickers_contents
2024/11/01 08:46:21.533770  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.processed_tags_tickers_contents
2024/11/01 08:46:21.532761 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.milvus_ids_indexed
2024/11/01 08:46:21.533857  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.milvus_ids_indexed
2024/11/01 08:46:21.532764 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.sentiment_contents
2024/11/01 08:46:21.533961  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.sentiment_contents
2024/11/01 08:46:21.532801 debug data to 'detached' copied backup=shard{shard}-full-20241030033305 operation=restore_data table=vector_storage.ignored_contents
2024/11/01 08:46:21.534042  info done                      backup=shard{shard}-full-20241030033305 duration=3ms operation=restore_data table=vector_storage.ignored_contents
2024/11/01 08:46:21.532307  info done                      backup=shard{shard}-full-20241030033305 duration=1ms operation=restore_data table=vector_storage.articles
2024/11/01 08:46:21.534121  info done                      backup=shard{shard}-full-20241030033305 duration=491ms operation=restore_data
2024/11/01 08:46:21.534158  info DROP FUNCTION IF EXISTS `rot47` logger=clickhouse
2024/11/01 08:46:21.535289  info CREATE OR REPLACE FUNCTION rot47 AS s -> if(s IS NULL, NULL, arrayStringConcat(arrayMap(c -> if((c >= '!') AND (c <= '~'), char((((ascii(c) - 33) + 47) % 94) + 33), c), splitByString('', s)))) logger=clickhouse
2024/11/01 08:46:21.551926  info done                      backup=shard{shard}-full-20241030033305 duration=918ms operation=restore
2024/11/01 08:46:21.552078  info clickhouse connection closed logger=clickhouse
gabrielheck commented 3 weeks ago

@Slach

could you share

./clickhouse-backup -c ./config.yml tables --remote-backup shard{shard}-full-20240425171537
./clickhouse-backup -c ./config.yml list remote
root@190bbd9f0c5d:/tmp/build/linux/amd64# ./clickhouse-backup -c ./config.yml tables --remote-backup shard{shard}-full-20240425171537
2024/11/01 08:59:27.499161  info clickhouse connection prepared: tcp://clickhouse:9000 run ping logger=clickhouse
2024/11/01 08:59:27.502203  info clickhouse connection success: tcp://clickhouse:9000 logger=clickhouse
2024/11/01 08:59:27.502255  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/11/01 08:59:27.509765  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/11/01 08:59:27.513140  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/11/01 08:59:27.519817  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/11/01 08:59:27.737783  info clickhouse connection closed logger=clickhouse
root@190bbd9f0c5d:/tmp/build/linux/amd64# ./clickhouse-backup -c ./config.yml list remote
2024/11/01 09:04:36.384232  info clickhouse connection prepared: tcp://clickhouse:9000 run ping logger=clickhouse
2024/11/01 09:04:36.386842  info clickhouse connection success: tcp://clickhouse:9000 logger=clickhouse
2024/11/01 09:04:36.386882  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/11/01 09:04:36.395819  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/11/01 09:04:36.399065  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/11/01 09:04:36.405682  info SELECT macro, substitution FROM system.macros logger=clickhouse
shard{shard}-full-20240425171537        ???         01/01/0001 00:00:00   remote                                            broken (can't stat metadata.json)
shard{shard}-full-20240521170116        ???         01/01/0001 00:00:00   remote                                            broken (can't stat metadata.json)
shard{shard}-increment-20240507161923   ???         01/01/0001 00:00:00   remote                                            broken (can't stat metadata.json)
shard{shard}-increment-20240515072526   ???         01/01/0001 00:00:00   remote                                            broken (can't stat metadata.json)
shard{shard}-increment-20240515072533   ???         01/01/0001 00:00:00   remote                                            broken (can't stat metadata.json)
shard{shard}-full-20241021033304        727.60KiB   21/10/2024 03:33:10   remote                                            tar, regular
shard{shard}-full-20241022033305        727.59KiB   22/10/2024 03:33:11   remote                                            tar, regular
shard{shard}-full-20241028033305        738.84KiB   28/10/2024 03:33:12   remote                                            tar, regular
shard{shard}-increment-20241028093305   738.84KiB   28/10/2024 09:33:25   remote   +shard{shard}-full-20241028033305        tar, regular
shard{shard}-increment-20241028153305   738.84KiB   28/10/2024 15:33:27   remote   +shard{shard}-increment-20241028093305   tar, regular
shard{shard}-increment-20241028213305   738.84KiB   28/10/2024 21:33:25   remote   +shard{shard}-increment-20241028153305   tar, regular
shard{shard}-full-20241029033305        738.84KiB   29/10/2024 03:33:11   remote                                            tar, regular
shard{shard}-increment-20241029093305   738.84KiB   29/10/2024 09:33:26   remote   +shard{shard}-full-20241029033305        tar, regular
shard{shard}-increment-20241029153305   738.84KiB   29/10/2024 15:33:25   remote   +shard{shard}-increment-20241029093305   tar, regular
shard{shard}-increment-20241029213305   738.84KiB   29/10/2024 21:33:25   remote   +shard{shard}-increment-20241029153305   tar, regular
shard{shard}-full-20241030033305        738.84KiB   30/10/2024 03:33:12   remote                                            tar, regular
shard{shard}-increment-20241030093305   738.84KiB   30/10/2024 09:33:26   remote   +shard{shard}-full-20241030033305        tar, regular
shard{shard}-increment-20241030153305   738.84KiB   30/10/2024 15:33:26   remote   +shard{shard}-increment-20241030093305   tar, regular
shard{shard}-increment-20241030213305   742.20KiB   30/10/2024 21:33:26   remote   +shard{shard}-increment-20241030153305   tar, regular
shard{shard}-full-20241031033305        742.22KiB   31/10/2024 03:33:12   remote                                            tar, regular
shard{shard}-increment-20241031093305   742.22KiB   31/10/2024 09:33:27   remote   +shard{shard}-full-20241031033305        tar, regular
shard{shard}-increment-20241031153305   741.63KiB   31/10/2024 15:33:25   remote   +shard{shard}-increment-20241031093305   tar, regular
shard{shard}-full-20240521171144        742.22KiB   31/10/2024 21:55:37   remote                                            tar, regular
backup20241031                          21.36GiB    31/10/2024 22:53:57   remote                                            tar, regular
2024/11/01 09:04:36.827568  info clickhouse connection closed logger=clickhouse
Slach commented 3 weeks ago

shard{shard}-full-20240425171537 ??? 01/01/0001 00:00:00 remote broken (can't stat metadata.json)

It means upload was begin, but not successful. I don't know how did you successful download this backup

shard{shard}-full-20241030033305 738.84KiB 30/10/2024 03:33:12 remote

this is contradict with following information

15:31:17.918391 info done backup=shard{shard}-full-20241030033305 duration=612ms logger=backuper operation=download size=737.47GiB

you can't download 737Gib in 612ms

check your source cluster logs looks like you backup nothing

how did you run clickhouse-backup create?

did you read https://github.com/Altinity/clickhouse-backup/#dont-run-clickhouse-backup-remotely ?

gabrielheck commented 3 weeks ago

shard{shard}-full-20240425171537 ??? 01/01/0001 00:00:00 remote broken (can't stat metadata.json)

It means upload was begin, but not successful. I don't know how did you successful download this backup

shard{shard}-full-20241030033305 738.84KiB 30/10/2024 03:33:12 remote

this is contradict with following information

15:31:17.918391 info done backup=shard{shard}-full-20241030033305 duration=612ms logger=backuper operation=download size=737.47GiB

you can't download 737Gib in 612ms

The list command displays the backup size, but in my AZBLOB container, there's no shadow folder, only the metadata. This led me to believe that the backups were intact, but to my surprise, the backup actually failed. I'm now inclined to disregard the results from the list command and instead always verify my backup by restoring it on a separate machine to ensure it's complete. It’s more work, but necessary.

check your source cluster logs looks like you backup nothing

how did you run clickhouse-backup create?

did you read https://github.com/Altinity/clickhouse-backup/#dont-run-clickhouse-backup-remotely ?

Yes, I did. To run the backup, I launch a container from the altinity/clickhouse-backup:2.6.2 image, which accesses the ClickHouse server data through a shared volume mounted at /var/lib/clickhouse/. Are there any concerns with this setup?

Slach commented 3 weeks ago

could you share upload logs for shard{shard}-full-20240425171537 and sha rd{shard}-full-20241030033305?

gabrielheck commented 2 weeks ago

Hi @Slach

I have identified the cause of the error.

ClickHouse container was creating a dynamic volume named clickhouse_clickhouse_data, while the ClickHouse-backup container expected a volume named clickhouse_data. This discrepancy in volume names resulted in a configuration error within the ClickHouse-backup stack, preventing access to the necessary data directory and leading to backups that contained only metadata.

To prevent similar issues in the future, I recommend modifying the ClickHouse-backup tool to halt the backup operation if it cannot access the data directory. This adjustment would help eliminate false positives during execution, as experienced in my environment, which ultimately contributed to this incident.

Thank you for your assistance and support throughout this process!

Slach commented 2 weeks ago

ClickHouse-backup tool to halt the backup operation if it cannot access the data directory.

maybe we need some detection, and two different cases, it was empty table after FREEZE, or we just not have access to disk (you have /var/lib/clickhouse in your case)