FrankChen021 / bithon

An observability platform mainly for Java
Apache License 2.0
14 stars 4 forks source link

pipeline starts before all schema loaded #761

Closed FrankChen021 closed 3 months ago

FrankChen021 commented 3 months ago

The following log observed.

03-26 21:34:42.814 | pipeline-dcd887cb8-vr5kb | ERROR | o.b.s.storage.datasource.SchemaManager | main | [bTxId: , bSpanId: , bMode: ] !!!!!!Timeout to wait for the first loading of schemas!!!

The reason is that in the SchemaManager, which is initialized before the Pipeline, waits for 3 seconds to complete the initialization, however, for each schema, there might be table initialization which takes time.

    @Override
    public void start() {
        log.info("Starting schema incremental loader...");
        loaderScheduler = ScheduledExecutorServiceFactor.newSingleThreadScheduledExecutor(NamedThreadFactory.of("schema-loader"));
        loaderScheduler.scheduleWithFixedDelay(this::incrementalLoadSchemas,
                                               // no delay to execute the first task
                                               0,
                                               1,
                                               TimeUnit.MINUTES);

        // Wait until the load complete
        // Not able to use the Future object returned by the scheduleWithFixedDelay above because the 'get'
        // works abnormally as its javadoc says
        int count = 0;
        while (this.lastLoadAt == 0 && count++ < 30) {
            try {
                Thread.sleep(100);
            } catch (InterruptedException ignored) {
            }
        }
        if (this.lastLoadAt == 0) {
            log.error("!!!!!!Timeout to wait for the first loading of schemas!!!");
        }
    }

This causes the pipeline initializes before all schema load. Once the pipeline is initialized, messages will come in, however, during message processing, the schema is still not ready. This causes message lost.

We have to wait the initialization completion of the schema manager.

FrankChen021 commented 3 months ago

This reflecets a legacy dependency problem. The dependencies should be reversed. For each pipeline, when the processor is initialized, it has to load all input sources. That's to say, the input source manager has to be splitted into the pipeline.