BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.92k stars 181 forks source link

[BUG] Logs being admitted despite settings causing HA failures #1575

Open lmeyerov opened 3 years ago

lmeyerov commented 3 years ago

Describe the bug

Despite setting bc(enable_logs=False, LOGGING_LEVEL='warn', ...), we are seeing debug logs being emitted to algebra.log and blazing_log/pyblazing.log. In a separate setup (dask-cuda workers), we're seeing blazing_log/{RAL.0.log,pyblazing.0.log}.

This is a problem, for example, in HA settings that run periodic healthchecks. In our case, this ended up causing a Denial of Service / Out-of-Memory on a node as we didn't have log rotations setup outside of standard locations.

As an interim workaround, we're setting a periodic watcher to echo "" > *.log

Steps/Code to reproduce bug

Context options:

allocator='existing',
enable_logging=False,
config_options={'protocol': 'tcp'},
LOGGING_LEVEL='warn'

Healthcheck: create table, add some numbers, drop

Expected behavior

Only see warning-or-higher logs. In our case, that would have been no log output.

Environment overview (please complete the following information)

RAPIDS 0.19 in docker

Environment details

Additional context Add any other context about the problem here.

----For BlazingSQL Developers---- Suspected source of the issue Where and what are potential sources of the issue

Other design considerations What components of the engine could be affected by this?

lmeyerov commented 3 years ago

A bit more:

On load, from a clean system, so would expect 0 logs:

Every 1.0s: ./dc exec forge-etl-python cat algebra.log blazing_log/pyblazing.log rmm_log.txt                                                                                  leo-UX331UN: Mon Jun 28 10:09:38 2021
+ docker-compose -f compose/production.yml exec forge-etl-python cat algebra.log blazing_log/pyblazing.log rmm_log.txt
DEBUG: com.blazingdb.calcite.schema.BlazingSchema - getting table names
DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalProject(x=[$0])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
BindableTableScan(table=[[main, df]], aliases=[[x]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
LogicalProject(EXPR$0=[CASE(=($1, 0), null:BIGINT, $0)])
  LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)], agg#1=[COUNT($0)])
    LogicalTableScan(table=[[main, df]])

2021-06-28 17:08:50,016||INFO|||"create_table start for df"||||||
cat: rmm_log.txt: No such file or directory 

After health checks:

+ docker-compose -f compose/production.yml exec forge-etl-python cat algebra.log blazing_log/pyblazing.log rmm_log.txt
DEBUG: com.blazingdb.calcite.schema.BlazingSchema - getting table names
DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalProject(x=[$0])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
BindableTableScan(table=[[main, df]], aliases=[[x]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalAggregate(group=[{}], EXPR$0=[SUM($0)])
  LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
LogicalProject(EXPR$0=[CASE(=($1, 0), null:BIGINT, $0)])
  LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)], agg#1=[COUNT($0)])
    LogicalTableScan(table=[[main, df]])

DEBUG: com.blazingdb.calcite.schema.BlazingSchema - getting table names
DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalProject(a=[$0], b=[$1])
  LogicalTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
BindableTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]], aliases=[[a, b]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - non optimized
LogicalAggregate(group=[{}], the_sum=[SUM($0)])
  LogicalProject(b=[$1])
    LogicalTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]])

DEBUG: com.blazingdb.calcite.application.RelationalAlgebraGenerator - optimized
LogicalProject(the_sum=[CASE(=($1, 0), null:BIGINT, $0)])
  LogicalAggregate(group=[{}], the_sum=[$SUM0($0)], agg#1=[COUNT($0)])
    BindableTableScan(table=[[main, uecf1715992f344648df7558f2fa24b71]], projects=[[1]], aliases=[[b]])

2021-06-28 17:08:50,016||INFO|||"create_table start for df"||||||
2021-06-28 17:09:58,488||INFO|||"create_table start for uecf1715992f344648df7558f2fa24b71"||||||
2021-06-28 17:09:58,488||INFO|||"create_table start for uecf1715992f344648df7558f2fa24b71"||||||
cat: rmm_log.txt: No such file or directory
wmalpica commented 3 years ago

@lmeyerov From this issue i think we need to implement a feature that will purge old logs so that they dont over accumulate and be able to turn off all logs. There are some logs that right now you cant turn off, which are the pyblazing logs and algebra logs. But the other logs you should be able to turn off as follows:

config_options = { 'LOGGING_LEVEL':'err', 'ENABLE_GENERAL_ENGINE_LOGS':False, 'ENABLE_COMMS_LOGS':False, 'ENABLE_TASK_LOGS':False, 'ENABLE_OTHER_ENGINE_LOGS':False}
bc = BlazingContext(enable_logging=False,config_options=config_options)

Note that the enable_logging parameter you pass directly to the BlazingContext is for the allocator logs, which are off by default. The other parameters are set via config_options. The ones relevant to logs are these:

LOGGING_LEVEL: string
                Set the level (as string) to register into the logs
                for the current tool of logging. Log levels have order of priority:
                ``{trace, debug, info, warn, err, critical, off}``. Using ``'trace'`` will
                registers all info.
                **NOTE:** This parameter only works when used in the
                BlazingContext
                **Default:** ``'trace'``
            LOGGING_FLUSH_LEVEL: string
                Set the level (as string) of the flush for
                the current tool of logging. Log levels have order of priority:
                ``{trace, debug, info, warn, err, critical, off}``
                **NOTE:** This parameter only works when used in the
                BlazingContext
                **Default:** ``'warn'``
            ENABLE_GENERAL_ENGINE_LOGS: boolean
                Enables ``'batch_logger'`` logger
                **Default:** ``True``
            ENABLE_COMMS_LOGS: boolean
                Enables ``'output_comms'`` and ``'input_comms'`` logger
                **Default:** ``False``
            ENABLE_TASK_LOGS: boolean
                Enables ``'task_logger'`` logger
                **Default:** ``False``
            ENABLE_OTHER_ENGINE_LOGS: boolean
                Enables ``'queries_logger'``, ``'kernels_logger'``,
                ``'kernels_edges_logger'``, ``'cache_events_logger'`` loggers
                **Default:** ``False``
            LOGGING_MAX_SIZE_PER_FILE: string
                Set the max size in bytes for the log files.
                **NOTE:** This parameter only works when used in the
                BlazingContext
                **Default:** ``1GB``
lmeyerov commented 3 years ago

afaict:

So in this case, seems more about controlling file names & fidelity / disabling, and disabling can be the same as sending to /dev/null :)