FE keeps complaining wait catalog to be ready after restarted

Fullstop000 commented 4 years ago

Using doris version 0.12.0

I set up a cluster with 1 FE and 2 BEs and load a test table with single partition. After I restarted FE, the FE was just hanged and kept producing logs:

2020-08-11 15:21:00,324 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:02,324 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:04,325 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:06,325 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:08,326 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:10,326 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:12,327 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:14,327 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:16,328 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false
2020-08-11 15:21:18,328 INFO 1 [Catalog.waitForReady():751] wait catalog to be ready. FE type: UNKNOWN. is ready: false

morningman commented 4 years ago

You can add metadata_failure_recovery=true in fe.conf and restart FE again.

After all goes well, remove this config from fe.conf

chess3cake commented 4 months ago

@morningman In version 2.1.3. It's still represent.

I deploied doris 1FE 1BE in k8s,by doris-operator. In the first time, everything was ok. But if i restarted the fe pod," wait catalog to be ready" started. I just deleted the pod and had k8s automatically scheduled withuot any changes.

My fe config like this

  enable_deploy_manager = k8s
  enable_fqdn_mode = true
  enable_batch_delete_by_default=true
  streaming_label_keep_max_second=21600
  priority_networks=10.214.0.192/26;10.214.0.0/26;10.214.0.64/26;10.214.0.128/26;10.214.1.64/26
  metadata_failure_recovery = true

javidHao commented 4 months ago

In version 2.0.11. 1FE node and 1BE node This is a test cluster, FE was killed by the system due to insufficient memory. A startup exception occurs after restarting FE. After adding metadata_failure_recovery = true configuration, you can restart normally. It should be that the abnormal exit caused the metadata file to be damaged.

apache / doris

FE keeps complaining wait catalog to be ready after restarted #4322