Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.38k stars 262 forks source link

High Memory usage of TBase. (was:Coordinators consistently OOM/crash under the load.) #80

Closed yazun closed 3 years ago

yazun commented 3 years ago

And unfortunately do not leave much trace. I.E. no cores dumpded.

We have few hundred processes connecting to 12 co-located coordinators with datanodes.

XL works quite well with 8 nodes and 1k+ processes connecting, with the same settings (more inter-connections for TBase are defined of course).

This is OLTP workload.

We see that memory growth and eventually some coordinators fail, usually 1-3 on the cluster, roughly the same time.

Where coordinators fail we see

[Sat Oct 10 22:55:45 2020] [23955] 10256 23955 17249140     1160     149        0             0 postgres
[Sat Oct 10 22:55:45 2020] [23956] 10256 23956 17249140     1160     149        0             0 postgres
[Sat Oct 10 22:55:45 2020] [23982] 10256 23982  3745808     1566     134        0             0 postgres
[Sat Oct 10 22:55:45 2020] [23983] 10256 23983  3745896     2158     147        0             0 postgres
[Sat Oct 10 22:55:45 2020] [23984] 10256 23984  3746002     2442     153        0             0 postgres
[Sat Oct 10 22:55:45 2020] [23985] 10256 23985  3746130     2314     147        0             0 postgres
[Sat Oct 10 22:55:45 2020] [24021]     0 24021    28347      109      11        0             0 byobu-status
[Sat Oct 10 22:55:45 2020] [24058]     0 24058     5001       27      14        0             0 tmux
[Sat Oct 10 22:55:45 2020] Out of memory: Kill process 25944 (postgres) score 70 or sacrifice child
[Sat Oct 10 22:55:45 2020] Killed process 21368 (postgres), UID 10256, total-vm:69259984kB, anon-rss:265684kB, file-rss:40kB, shmem-rss:291024kB

As you can see on the attached graphs we see quick RAM exhaustion where on XL on the 8 nodes the memory stays around 140GB per node for days for much higher number of same workers.

Before the crash we can see flood of msgs like

Oct 10 22:53:16 gaiadb04 dn[14646]: [11] 2020-10-10 22:53:16 CEST [14646,coord(17698,5)]:xid[772989-519/0] [4-1] user=dr3_ops_cs36,db=surveys,client=192.168.168.145,query=PostgreSQL JDBC Driver,WARNING:  01000: prepared statement leak: Statement tt_dn_up_p_1_4522_b_4c93fc3 still referenced#0122020-10-10 22:53:16 CEST [14646,coord(17698,5)]:xid[772989-519/0] [5-1] user=dr3_ops_cs36,db=surveys,client=192.168.168.145,query=PostgreSQL JDBC Driver,LOCATION:  PrintPreparedStmtLeakWarning, resowner.c:1433#0122020-10-10 22:53:16 CEST [14646,coord(17698,5)]:xid[772989-519/0] [6-1] user=dr3_ops_cs36,db=surveys,client=192.168.168.145,query=PostgreSQL JDBC Driver,STATEMENT:  PREPARE TRANSACTION '_$XC$822258:coord1:F:12:0'
...

Oct 10 22:53:32 gaiadb03 coord[14364]: [73] 2020-10-10 22:53:31 CEST [14364,coord(14364,0)]:xid[0-] [129-1] user=,db=,client=,query=,LOG:  00000: PoolMgr: pooler_async_task_pick_thread use thread index:0 nodeindex:6 in busy status, duration:4#0122020-10-10 22:53:31 CEST [14364,coord(14364,0)]:xid[0-] [130-1] user=,db=,client=,query=,LOCATION:  pooler_async_task_pick_thread, poolmgr.c:8887
...

gaiadb10 coord[16358]: [260] 2020-10-10 22:53:38 CEST [16358,coord(16358,0)]:xid[0-] [503-1] user=,db=,client=,query=,LOG:  00000: PoolMgr: pooler_async_task_pick_thread stuck at remote_ip:gaiadb01i, remote_port:55436, remote_nodeoid:16395, remote remote_backend_pid:17953#0122020-10-10 22:53:38 CEST [16358,coord(16358,0)]:xid[0-] [504-1] user=,db=,client=,query=,LOCATION:  pooler_async_task_pick_thread, poolmgr.c:8879
...

gaiadb12 coord[40400]: [621] 2020-10-10 22:54:01 CEST [40400,coord(40400,0)]:xid[0-] [1225-1] user=,db=,client=,query=,LOG:  00000: PoolMgr: no pipeline avaliable, pooler_async_build_connection node:16396 nodeidx:1#0122020-10-10 22:54:01 CEST [40400,coord(40400,0)]:xid[0-] [1226-1] user=,db=,client=,query=,LOCATION:  pooler_async_build_connection, poolmgr.c:6657
...

10 22:54:01 gaiadb12 coord[40400]: [622] 2020-10-10 22:54:01 CEST [40400,coord(40400,0)]:xid[0-] [1227-1] user=,db=,client=,query=,LOG:  00000: PoolMgr: fail to pick a thread for operation nodeindex:2#0122020-10-10 22:54:01 CEST [40400,coord(40400,0)]:xid[0-] [1228-1] user=,db=,client=,query=,LOCATION:  pooler_async_task_pick_thread, poolmgr.c:8905

We can share the full log if it helps.

We tweaked few pooler related GUCs (max mem, timeout to see if it helps) and will report back.

Let us know if you have ideas what could be the culprit.

Screenshot 2020-10-11 at 00 12 03
yazun commented 3 years ago

Correction: datanodes fail as well, but restart.

yazun commented 3 years ago

W can confirm that Tbase is consuming much more memory than XL and we cannot pinpoint to any GUC (we lowered some values i.e work_mem='2MB' (from 4MB), shared_queue_size=256 (from 1024), even shared buffers are 5GB smaller than XL and we see double memory needs for 460 connections distributed over 12 coords. We can easily handle 1100 connections distributed over 8 nodes in XL with smaller memory footprint. This is a blocker for us so would be grateful for hints where the huge consumption could be coming from.

yazun commented 3 years ago

The reality is usually more complicated: it turned out that increased memory usage is related to not using huge pages by Tbase as XL is already using big portion. After finally squeezing max_xonnections etc for test and enabling huge_pages='on' the memory behaviour is much better - it's stable and we do not see any trend.

Btw. shared_queues/shared_queue_size have huge impact memory-wise - could you comment what would be your suggested number? - max_connections on coord or max_connections on a datanode?

JennyJennyChen commented 3 years ago

Thank you for your issue. We would like to know when the situation you mentioned happened: 1、 OLTP or OLAP workload? shared_queues/shared_queue_size is more OLAP related 2、was it when mass data COPY/INSERT entry? or was it when normal high concurrency and small data access DML?

mikewhb commented 3 years ago

The reality is usually more complicated: it turned out that increased memory usage is related to not using huge pages by Tbase as XL is already using big portion. After finally squeezing max_xonnections etc for test and enabling huge_pages='on' the memory behaviour is much better - it's stable and we do not see any trend.

Btw. shared_queues/shared_queue_size have huge impact memory-wise - could you comment what would be your suggested number? - max_connections on coord or max_connections on a datanode?

It seems enabling huge_pages='on' has no effect on the amount of memory usage. Do you have some data of memory usage about the two cases huge_pages='on' and huge_pages='off' but keep the max_connections constant. In theory when enabling huge_pages='on' it may cost even more memory in some cases.

yazun commented 3 years ago
  1. 1000 connections distributed over 12 nodes, relatively small queries that finish in under 1 second, small percentile above 5 sec due to the small spikes in load.
  2. Same time it is rather intensive insert activity, but with tons of selects as well. We produce a lot of results for the input. Volume-wise probably 2:1. I'd say OLTP.

We can rerun the experiment later this week with both huge pages on and off and report here.

yazun commented 3 years ago

Does not look like hugePages change behaviour that much indeed. Around the crash we see a lot of

00000: validate_combiner request_type is REQUEST_TYPE_NOT_DEFINED

and

01000: prepared statement leak: Statement tt_dn_up_p_1_5a76_b_63689d39 still referenced#0122020-11-03 02:16:59 CET [37896,coord(23158,20)]:xid[1055568-688/0] [2-1] 

and fewer

00000: PoolMgr: pooler_async_task_pick_thread use thread index:1 nodeindex:5 in busy status, duration:1#0122020-11-03 02:16:24 CET [40057,coord(40057,0)]:xid[0-] [1630-1]

messages.

This is our config on datanodes, would you have any suggestions?

There are number of new GUCs related to the pooler - are any might change the memory allocation in comparison to XL?

Btw shared_queues on datanode is not taken from the conf but defaults to 512 whatever the value we set...


                  name                  |                           setting                           |                                                          description                                                          
----------------------------------------+-------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------
 account_lock_threshold                 | 10                                                          | Account will be locked after failed  account_lock_threshold times
 account_lock_time                      | 10min                                                       | The time which user account will be locked.
 account_lock_track_count               | 1000                                                        | The limit to store user account info. aka the max num of hash table entry.
 allow_dml_on_datanode                  | on                                                          | allow insert/update/delete directly on datanode
 allow_force_ddl                        | off                                                         | allow forced ddl of inconsistent metadata
 allow_system_table_mods                | off                                                         | Allows modifications of the structure of system tables.
 alog_common_cache_size                 | 64kB                                                        | Size of common audit log local buffer for each audit worker, kilobytes.
 alog_common_queue_size                 | 64kB                                                        | Size of share memory queue for each backend to store common audit log, kilobytes.
 alog_fga_cacae_size                    | 64kB                                                        | Size of fga audit log local buffer for each audit worker, kilobytes.
 alog_fga_queue_size                    | 64kB                                                        | Size of share memory queue for each backend to store fga audit log, kilobytes.
 alog_file_mode                         | 0600                                                        | Sets the file permissions for audit log files.
 alog_filename                          | audit-%A-%H.log                                             | Sets the file name pattern for audit log files.
 alog_max_worker_number                 | 16                                                          | Max number of worker thead to read audit log from shared memory queue.
 alog_rotation_age                      | 1d                                                          | Automatic audit log file rotation will occur after N minutes.
 alog_rotation_size                     | 10MB                                                        | Automatic audit log file rotation will occur after N kilobytes.
 alog_truncate_on_rotation              | off                                                         | Truncate existing log files of same name during log rotation.
 application_name                       | psql                                                        | Sets the application name to be reported in statistics and logs.
 archive_autowake_interval              | 30                                                          | how often to force a poll of the archive status directory in seconds.
 archive_command                        | (disabled)                                                  | Sets the shell command that will be called to archive a WAL file.
 archive_mode                           | off                                                         | Allows archiving of WAL files using archive_command.
 archive_status_control                 | continue                                                    | Control of break or continue archive xlog. 
 archive_timeout                        | 0                                                           | Forces a switch to the next WAL file if a new file has not been started within N seconds.
 array_nulls                            | on                                                          | Enable input of NULL elements in arrays.
 authentication_timeout                 | 1min                                                        | Sets the maximum allowed time to complete client authentication.
 autovacuum                             | on                                                          | Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor        | 0.02                                                        | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples.
 autovacuum_analyze_threshold           | 2500000                                                     | Minimum number of tuple inserts, updates, or deletes prior to analyze.
 autovacuum_freeze_max_age              | 2000000000                                                  | Age at which to autovacuum a table to prevent transaction ID wraparound.
 autovacuum_max_workers                 | 5                                                           | Sets the maximum number of simultaneously running autovacuum worker processes.
 autovacuum_multixact_freeze_max_age    | 400000000                                                   | Multixact age at which to autovacuum a table to prevent multixact wraparound.
 autovacuum_naptime                     | 30min                                                       | Time to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay           | 20ms                                                        | Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit           | -1                                                          | Vacuum cost amount available before napping, for autovacuum.
 autovacuum_vacuum_scale_factor         | 0.03                                                        | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
 autovacuum_vacuum_threshold            | 3500000                                                     | Minimum number of tuple updates or deletes prior to vacuum.
 autovacuum_work_mem                    | -1                                                          | Sets the maximum memory to be used by each autovacuum worker process.
 backend_flush_after                    | 0                                                           | Number of pages after which previously performed writes are flushed to disk.
 backslash_quote                        | safe_encoding                                               | Sets whether "\'" is allowed in string literals.
 base_backup_limit                      | 50                                                          | basebackup speed limit.
 bgwriter_delay                         | 250ms                                                       | Background writer sleep time between rounds.
 bgwriter_flush_after                   | 1MB                                                         | Number of pages after which previously performed writes are flushed to disk.
 bgwriter_lru_maxpages                  | 1000                                                        | Background writer maximum number of LRU pages to flush per round.
 bgwriter_lru_multiplier                | 2                                                           | Multiple of the average buffer usage to free per round.
 block_size                             | 16384                                                       | Shows the size of a disk block.
 bonjour                                | off                                                         | Enables advertising the server via Bonjour.
 bonjour_name                           |                                                             | Sets the Bonjour service name.
 bytea_output                           | hex                                                         | Sets the output format for bytea.
 check_function_bodies                  | on                                                          | Check function bodies during CREATE FUNCTION.
 checkpoint_completion_target           | 0.9                                                         | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
 checkpoint_crypt_queue_length          | 32                                                          | length of crypt queue between checkpoint main and crypt workers.
 checkpoint_crypt_worker                | 4                                                           | number of workers for parellel crypt.
 checkpoint_flush_after                 | 512kB                                                       | Number of pages after which previously performed writes are flushed to disk.
 checkpoint_timeout                     | 30min                                                       | Sets the maximum time between automatic WAL checkpoints.
 checkpoint_warning                     | 30s                                                         | Enables warnings if checkpoint segments are filled more frequently than this.
 client_encoding                        | SQL_ASCII                                                   | Sets the client's character set encoding.
 client_min_messages                    | notice                                                      | Sets the message levels that are sent to the client.
 cluster_name                           |                                                             | Sets the name of the cluster, which is included in the process title.
 cold_hot_sepration_mode                | month                                                       | Set the cold_hot_sepration_mode, day/month/year.
 commit_delay                           | 100                                                         | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
 commit_siblings                        | 10                                                          | Sets the minimum concurrent open transactions before performing commit_delay.
 config_file                            | /ssd/xzDR3/dn_master/postgresql.conf                        | Sets the server's main configuration file.
 constraint_exclusion                   | partition                                                   | Enables the planner to use constraints to optimize queries.
 consumer_connect_timeout               | 60                                                          | timeout to comsumer connect to producer
 cpu_index_tuple_cost                   | 0.002                                                       | Sets the planner's estimate of the cost of processing each index entry during an index scan.
 cpu_operator_cost                      | 0.0015                                                      | Sets the planner's estimate of the cost of processing each operator or function call.
 cpu_tuple_cost                         | 0.05                                                        | Sets the planner's estimate of the cost of processing each tuple (row).
 create_key_value_mode                  | 0                                                           | To control key value created on nodes range, 0:all, 1:cn only, 2:dn only
 cursor_tuple_fraction                  | 0.1                                                         | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved.
 data_checksums                         | off                                                         | Shows whether data checksums are turned on for this cluster.
 data_directory                         | /ssd/xzDR3/dn_master                                        | Sets the server's data directory.
 datarow_buffer_size                    | 32                                                          | buffer size for prefetch.
 DateStyle                              | ISO, MDY                                                    | Sets the display format for date and time values.
 db_user_namespace                      | off                                                         | Enables per-database user names.
 deadlock_timeout                       | 1s                                                          | Sets the time to wait on a lock before checking for deadlock.
 debug_assertions                       | off                                                         | Shows whether the running server has assertion checks enabled.
 debug_data_pump                        | off                                                         | enable debug to trace data pump.
 debug_pretty_print                     | on                                                          | Indents parse and plan tree displays.
 debug_print_parse                      | off                                                         | Logs each query's parse tree.
 debug_print_plan                       | off                                                         | Logs each query's execution plan.
 debug_print_rewritten                  | off                                                         | Logs each query's rewritten parse tree.
 default_hashagg_nbatches               | 32                                                          | number of batch files in hybrid-hash agg.
 default_locator_type                   | replication                                                 | set default table locator type, enum as shard/hash/replication, default is empty string
 default_statistics_target              | 10000                                                       | Sets the default statistics target.
 default_tablespace                     |                                                             | Sets the default tablespace to create tables and indexes in.
 default_text_search_config             | pg_catalog.english                                          | Sets default text search configuration.
 default_transaction_deferrable         | off                                                         | Sets the default deferrable status of new transactions.
 default_transaction_isolation          | repeatable read                                             | Sets the transaction isolation level of each new transaction.
 default_transaction_read_only          | off                                                         | Sets the default read-only status of new transactions.
 default_with_oids                      | off                                                         | Create new tables with OIDs by default.
 delay_after_acquire_committs           | 0                                                           | Delay transaction commit after acquiring commmitts (us).
 delay_before_acquire_committs          | 0                                                           | Delay transaction commit before acquiring commmitts (us).
 dis_consumer_timeout                   | 60                                                          | timeout to vacuum disconnected comsumer's entry
 distributed_query_analyze              | off                                                         | enable collecting distributed query info.
 dynamic_library_path                   | $libdir                                                     | Sets the path for dynamically loadable modules.
 dynamic_shared_memory_type             | posix                                                       | Selects the dynamic shared memory implementation used.
 effective_cache_size                   | 100GB                                                       | Sets the planner's assumption about the size of the disk cache.
 effective_io_concurrency               | 4                                                           | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
 enable_audit                           | off                                                         | Enable to audit user operations on the database objects.
 enable_audit_warning                   | off                                                         | Enable to print audit warning logs.
 enable_auditlogger_warning             | off                                                         | Enable to write audit logger process warnings.
 enable_bitmapscan                      | on                                                          | Enables the planner's use of bitmap-scan plans.
 enable_check_password                  | off                                                         | Enable password check.
 enable_cls                             | off                                                         | Enable to Cube Lable Security check.
 enable_cold_hot_visible                | off                                                         | Enable cold-hot visible no matter on cold/hot datanode.
 enable_cold_seperation                 | off                                                         | Enable cold storage seperation when make route strategy.
 enable_committs_print                  | off                                                         | enable commit ts debug print
 enable_concurrently_index              | on                                                          | enable create index concurrently.
 enable_copy_silence                    | off                                                         | Enable copy from silent when error happens.
 enable_crypt_check                     | off                                                         | Enable check crypt consistency, the crypted context could be decrypted, and the value is the same as original.
 enable_crypt_debug                     | off                                                         | Enable debug for crypt feature.
 enable_crypt_parellel_debug            | off                                                         | Enable trace buffer crypt procedure.
 enable_data_mask                       | on                                                          | Enable to datamask feature.
 enable_datanode_row_triggers           | on                                                          | Enables datanode-only ROW triggers
 enable_distri_debug                    | off                                                         | enable distributed transaction debug
 enable_distri_debug_print              | off                                                         | enable distributed transaction debug print
 enable_distri_visibility_print         | off                                                         | enable distributed transaction visibility print
 enable_distributed_unique_plan         | on                                                          | enable distributed unique plan.
 enable_fast_query_shipping             | on                                                          | Enables the planner's use of fast query shipping to ship query directly to datanode.
 enable_fga                             | off                                                         | Enable Fine-grained audit.
 enable_gathermerge                     | on                                                          | Enables the planner's use of gather merge plans.
 enable_group_across_query              | off                                                         | enable group-across queries.
 enable_gtm_debug_print                 | off                                                         | enable gtm debug infomation print
 enable_gtm_proxy                       | off                                                         | Enable gtm proxy a child process of Postmaster.
 enable_hashagg                         | on                                                          | Enables the planner's use of hashed aggregation plans.
 enable_hashjoin                        | on                                                          | Enables the planner's use of hash join plans.
 enable_indexonlyscan                   | on                                                          | Enables the planner's use of index-only-scan plans.
 enable_indexscan                       | on                                                          | Enables the planner's use of index-scan plans.
 enable_key_value                       | off                                                         | Enable key value lookup when make route strategy.
 enable_lock_account                    | off                                                         | Enable lock account when login fail serval times.
 enable_material                        | on                                                          | Enables the planner's use of materialization.
 enable_mergejoin                       | on                                                          | Enables the planner's use of merge join plans.
 enable_multi_cluster                   | off                                                         | Enable multiple clusters.
 enable_multi_cluster_print             | off                                                         | Enable multiple cluster print.
 enable_nestloop                        | on                                                          | Enables the planner's use of nested-loop join plans.
 enable_null_string                     | off                                                         | enable nulls in string.
 enable_oracle_compatible               | off                                                         | Enable oracle compatibility.
 enable_pgbouncer                       | off                                                         | use pgbouncer as coordinator connection pool.
 enable_plpgsql_debug_print             | off                                                         | enable plpgsql debug infomation print
 enable_pooler_debug_print              | off                                                         | enable pooler manager debug infomation print
 enable_pooler_stuck_exit               | off                                                         | enable pooler exit when pick up sync network thread failed
 enable_pullup_subquery                 | off                                                         | pullup subquery to make execution more efficient.
 enable_replication_slot_debug          | off                                                         | enable_replication_slot_debug
 enable_sampling_analyze                | off                                                         | sampling rows from datanodes when doing analyze on coordinator.
 enable_seqscan                         | on                                                          | Enables the planner's use of sequential-scan plans.
 enable_shard_statistic                 | on                                                          | collect statistic information for shard.
 enable_sort                            | on                                                          | Enables the planner's use of explicit sort steps.
 enable_statistic                       | off                                                         | collect statistic information for debug.
 enable_tidscan                         | on                                                          | Enables the planner's use of TID scan plans.
 enable_transparent_crypt               | on                                                          | Enable to transparent crypt feature.
 enable_user_authority_force_check      | off                                                         | control users to get the list of tables and functions which can be accessed and executed by these user.
 escape_string_warning                  | on                                                          | Warn about backslash escapes in ordinary string literals.
 event_source                           | PostgreSQL                                                  | Sets the application name used to identify PostgreSQL messages in the event log.
 exit_on_error                          | off                                                         | Terminate session on any error.
 external_pid_file                      |                                                             | Writes the postmaster PID to the specified file.
 extra_float_digits                     | 3                                                           | Sets the number of digits displayed for floating-point values.
 force_parallel_mode                    | off                                                         | Forces use of parallel query facilities.
 from_collapse_limit                    | 8                                                           | Sets the FROM-list size beyond which subqueries are not collapsed.
 fsync                                  | on                                                          | Forces synchronization of updates to disk.
 full_page_writes                       | off                                                         | Writes full pages to WAL when first modified after a checkpoint.
 geqo                                   | on                                                          | Enables genetic query optimization.
 geqo_effort                            | 5                                                           | GEQO: effort is used to set the default for other GEQO parameters.
 geqo_generations                       | 0                                                           | GEQO: number of iterations of the algorithm.
 geqo_pool_size                         | 0                                                           | GEQO: number of individuals in the population.
 geqo_seed                              | 0                                                           | GEQO: seed for random path selection.
 geqo_selection_bias                    | 2                                                           | GEQO: selective pressure within the population.
 geqo_threshold                         | 12                                                          | Sets the threshold of FROM items beyond which GEQO is used.
 gin_fuzzy_search_limit                 | 0                                                           | Sets the maximum allowed result for exact search by GIN.
 gin_pending_list_limit                 | 4MB                                                         | Sets the maximum size of the pending list for GIN index.
 global_snapshot_source                 | gtm                                                         | Set preferred source of a snapshot.
 gtm_backup_barrier                     | off                                                         | Enables coordinator to report barrier id to GTM for backup.
 gts_acquire_gap                        | 30s                                                         | Time gap between global timestamp acquirement in the WAL writer.
 hba_file                               | /ssd/xzDR3/dn_master/pg_hba.conf                            | Sets the server's "hba" configuration file.
 hot_data_age                           | 3650000                                                     | age of hot data, data older than this age will be turned cold.
 hot_standby                            | on                                                          | Allows connections and queries during recovery.
 hot_standby_feedback                   | off                                                         | Allows feedback from a hot standby to the primary that will avoid query conflicts.
 huge_pages                             | try                                                         | Use of huge pages on Linux.
 hybrid_hash_agg                        | off                                                         | enable hybrid-hash agg.
 hybrid_hash_agg_debug                  | off                                                         | enable hybrid-hash agg debug.
 ident_file                             | /ssd/xzDR3/dn_master/pg_ident.conf                          | Sets the server's "ident" configuration file.
 idle_in_transaction_session_timeout    | 0                                                           | Sets the maximum allowed duration of any idling transaction.
 ignore_checksum_failure                | off                                                         | Continues processing after a checksum failure.
 ignore_system_indexes                  | off                                                         | Disables reading from system indexes.
 init_pool_size                         | 10                                                          | Initial pool size.
 integer_datetimes                      | on                                                          | Datetimes are integer based.
 interval_sample_rate                   | 20                                                          | sample rate of interval partition.
 interval_sample_threshold              | 100                                                         | sample threshold of interval partition.
 IntervalStyle                          | postgres                                                    | Sets the display format for interval values.
 join_collapse_limit                    | 8                                                           | Sets the FROM-list size beyond which JOIN constructs are not flattened.
 krb_caseins_users                      | off                                                         | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive.
 krb_server_keyfile                     |                                                             | Sets the location of the Kerberos server key file.
 lc_collate                             | C                                                           | Shows the collation order locale.
 lc_ctype                               | C                                                           | Shows the character classification and case conversion locale.
 lc_messages                            | C                                                           | Sets the language in which messages are displayed.
 lc_monetary                            | C                                                           | Sets the locale for formatting monetary amounts.
 lc_numeric                             | C                                                           | Sets the locale for formatting numbers.
 lc_time                                | C                                                           | Sets the locale for formatting date and time values.
 listen_addresses                       | *                                                           | Sets the host name or IP address(es) to listen to.
 lo_compat_privileges                   | off                                                         | Enables backward compatibility mode for privilege checks on large objects.
 local_preload_libraries                |                                                             | Lists unprivileged shared libraries to preload into each backend.
 lock_account_print                     | off                                                         | Enable print log in lock account procedure.
 lock_timeout                           | 0                                                           | Sets the maximum allowed duration of any wait for a lock.
 log_autovacuum_min_duration            | 10s                                                         | Sets the minimum execution time above which autovacuum actions will be logged.
 log_checkpoints                        | on                                                          | Logs each checkpoint.
 log_connections                        | off                                                         | Logs each successful connection.
 log_destination                        | syslog                                                      | Sets the destination for server log output.
 log_directory                          | /ssd/xzDR3/log                                              | Sets the destination directory for log files.
 log_disconnections                     | off                                                         | Logs end of a session, including duration.
 log_duration                           | off                                                         | Logs the duration of each completed SQL statement.
 log_error_verbosity                    | verbose                                                     | Sets the verbosity of logged messages.
 log_executor_stats                     | off                                                         | Writes executor performance statistics to the server log.
 log_file_mode                          | 0600                                                        | Sets the file permissions for log files.
 log_filename                           | dn-%a.log                                                   | Sets the file name pattern for log files.
 log_gtm_stats                          | off                                                         | Writes GTM performance statistics to the server log.
 log_hostname                           | off                                                         | Logs the host name in the connection logs.
 log_line_prefix                        | %t [%p]:xid[%x-%v] [%l-1] user=%u,db=%d,client=%h,query=%a, | Controls information prefixed to each log line.
 log_lock_waits                         | on                                                          | Logs long lock waits.
 log_min_duration_statement             | 5s                                                          | Sets the minimum execution time above which statements will be logged.
 log_min_error_statement                | debug5                                                      | Causes all statements generating error at or above this level to be logged.
 log_min_messages                       | info                                                        | Sets the message levels that are logged.
 log_parser_stats                       | off                                                         | Writes parser performance statistics to the server log.
 log_planner_stats                      | off                                                         | Writes planner performance statistics to the server log.
 log_remotesubplan_stats                | off                                                         | Writes remote subplan performance statistics to the server log.
 log_replication_commands               | off                                                         | Logs each replication command.
 log_rotation_age                       | 7d                                                          | Automatic log file rotation will occur after N minutes.
 log_rotation_size                      | 0                                                           | Automatic log file rotation will occur after N kilobytes.
 log_statement                          | none                                                        | Sets the type of statements logged.
 log_statement_stats                    | off                                                         | Writes cumulative performance statistics to the server log.
 log_temp_files                         | 1000000kB                                                   | Log the use of temporary files larger than this number of kilobytes.
 log_timezone                           | Europe/Vaduz                                                | Sets the time zone to use in log messages.
 log_truncate_on_rotation               | on                                                          | Truncate existing log files of same name during log rotation.
 logging_collector                      | on                                                          | Start a subprocess to capture stderr output and/or csvlogs into log files.
 loose_constraints                      | off                                                         | Relax enforcing of constraints
 loose_unique_index                     | on                                                          | Relax enforcing of unique index
 maintenance_work_mem                   | 4GB                                                         | Sets the maximum memory to be used for maintenance operations.
 manual_hot_date                        |                                                             | Set the manual_hot_date, this value will mask off hot_date_age.
 max_connections                        | 6000                                                        | Sets the maximum number of concurrent connections.
 max_files_per_process                  | 1000                                                        | Sets the maximum number of simultaneously open files for each server process.
 max_function_args                      | 100                                                         | Shows the maximum number of function arguments.
 max_identifier_length                  | 63                                                          | Shows the maximum identifier length.
 max_index_keys                         | 32                                                          | Shows the maximum number of index keys.
 max_locks_per_transaction              | 500                                                         | Sets the maximum number of locks per transaction.
 max_logical_replication_workers        | 64                                                          | Maximum number of logical replication worker processes.
 max_network_bandwidth_per_subscription | 50                                                          | Maximum network bandwidth per subscription.
 max_parallel_workers                   | 40                                                          | Sets the maximum number of parallel workers than can be active at one time.
 max_parallel_workers_per_gather        | 20                                                          | Sets the maximum number of parallel processes per executor node.
 max_pool_size                          | 13200                                                       | Max pool size.
 max_pred_locks_per_page                | 2                                                           | Sets the maximum number of predicate-locked tuples per page.
 max_pred_locks_per_relation            | -2                                                          | Sets the maximum number of predicate-locked pages and tuples per relation.
 max_pred_locks_per_transaction         | 64                                                          | Sets the maximum number of predicate locks per transaction.
 max_prepared_transactions              | 13200                                                       | Sets the maximum number of simultaneously prepared transactions.
 max_replication_slots                  | 64                                                          | Sets the maximum number of simultaneously defined replication slots.
 max_sessions_per_shardpool             | 100                                                         | max number of sessions for each shard pool.
 max_stack_depth                        | 2MB                                                         | Sets the maximum stack depth, in kilobytes.
 max_standby_archive_delay              | 30s                                                         | Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data.
 max_standby_streaming_delay            | 30s                                                         | Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data.
 max_sync_workers_per_subscription      | 4                                                           | Maximum number of table synchronization workers per subscription.
 max_wal_senders                        | 64                                                          | Sets the maximum number of simultaneously running WAL sender processes.
 max_wal_size                           | 35GB                                                        | Sets the WAL size that triggers a checkpoint.
 max_worker_processes                   | 128                                                         | Maximum number of concurrent worker processes.
 min_free_size                          | 5                                                           | minimal pool free connection number.
 min_parallel_index_scan_size           | 512kB                                                       | Sets the minimum amount of index data for a parallel scan.
 min_parallel_table_scan_size           | 8MB                                                         | Sets the minimum amount of table data for a parallel scan.
 min_pool_size                          | 5                                                           | Min pool size.
 min_wal_size                           | 320MB                                                       | Sets the minimum size to shrink the WAL to.
 network_byte_cost                      | 0.001                                                       | Sets the planner's estimate of the cost of sending data from remote node.
 nls_date_format                        | YYYY-MM-DD HH24:MI:SS                                       | Emulate oracle's date output behaviour.
 nls_sort_locale                        |                                                             | COLLATE set for nlssort.
 nls_timestamp_format                   | YYYY-MM-DD HH24:MI:SS.US                                    | Emulate oracle's timestamp without time zone output behaviour.
 nls_timestamp_tz_format                | YYYY-MM-DD HH24:MI:SS.US TZ                                 | Emulate oracle's timestamp with time zone output behaviour.
 node_is_extension                      | off                                                         | if this datanode is extension
 olap_optimizer                         | on                                                          | enable OLAP optimizer to make Query more efficient.
 old_snapshot_threshold                 | -1                                                          | Time before a snapshot is too old to read pages changed after the snapshot was taken.
 operator_precedence_warning            | off                                                         | Emit a warning for constructs that changed meaning since PostgreSQL 9.4.
 page_ts_need_xlog                      | 1                                                           | Determine whether the page timestamp should be logged.
 parallel_setup_cost                    | 1000                                                        | Sets the planner's estimate of the cost of starting up worker processes for parallel query.
 parallel_tuple_cost                    | 0.1                                                         | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend.
 param_pass_down                        | off                                                         | enable exec param to pass down.
 parentnode                             |                                                             | Sets the name of the parent data node
 password_encryption                    | md5                                                         | Encrypt passwords.
 persistent_datanode_connections        | off                                                         | Session never releases acquired connections.
 pgbouncer_conf                         | pgbouncer.ini                                               | pgbouncer config file name, in PGDATA dir.
 pgxc_cluster_name                      | tbase_cluster                                               | The Cluster name.
 pgxc_main_cluster_name                 | tbase_cluster                                               | The Main Cluster name.
 pgxc_node_name                         | datanode1                                                   | The Coordinator or Datanode name.
 pgxl_remote_fetch_size                 | 10000                                                       | Number of maximum tuples to fetch in one remote iteration
 pool_check_slot_timeout                | 5s                                                          | Enable pooler check slot. When slot is using by agent, shouldn't exist in nodepool.
 pool_conn_keepalive                    | 1min                                                        | Close connections if they are idle in the pool for that time.
 pool_maintenance_timeout               | 10s                                                         | Launch maintenance routine if pooler idle for that time.
 pool_print_stat_timeout                | -1                                                          | Enable pooler print stat info.
 pool_session_context_check_gap         | 2min                                                        | Gap to check datanode session memory context.
 pool_session_max_lifetime              | 5min                                                        | Datanode session max lifetime.
 pool_session_memory_limit              | 20s                                                         | Datanode session max memory context size.
 pooler_connect_timeout                 | 10s                                                         | Pooler connection timeout.
 pooler_dn_set_timeout                  | 1min                                                        | Pooler datanode set query timeout.
 pooler_port                            | 50012                                                       | Port of the Pool Manager.
 pooler_scale_factor                    | 2                                                           | Pooler scale factor.
 pooler_unpooled_database               | template1                                                   | databases that pooler will not pool its connections.
 pooler_unpooled_user                   | mls_admin                                                   | users that pooler will not pool its connections.
 pooler_warm_db_user                    |                                                             | pooler warm datanode database and user.
 port                                   | 55436                                                       | Sets the TCP port the server listens on.
 post_auth_delay                        | 0                                                           | Waits N seconds on connection startup after authentication.
 pre_auth_delay                         | 0                                                           | Waits N seconds on connection startup before authentication.
 prefer_olap                            | on                                                          | Prefer to run OLAP.
 pub_stat_hash_size                     | 65536                                                       | size of hashtable used in publication statistics.
 pub_table_stat_hash_size               | 262144                                                      | size of hashtable used in publication's table statistics.
 quote_all_identifiers                  | off                                                         | When generating SQL fragments, quote all identifiers.
 random_collect_stats                   | on                                                          | Coordinator random collects the statistics from all data nodes.
 random_page_cost                       | 1.21                                                        | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
 remote_query_cost                      | 100                                                         | Sets the planner's estimate of the cost of setting up remote subquery.
 remotetype                             | application                                                 | Sets the type of Postgres-XL remote connection
 replacement_sort_tuples                | 150000                                                      | Sets the maximum number of tuples to be sorted using replacement selection.
 replication_level                      | 1                                                           | replication level on join to make Query more efficient.
 restart_after_crash                    | on                                                          | Reinitialize server after backend crash.
 restrict_query                         | on                                                          | restrict query to involved node as possible
 row_security                           | on                                                          | Enable row security.
 search_path                            | dr3_ops_cs36, dr3_ops_cs36_part, public                     | Sets the schema search order for names that are not schema-qualified.
 segment_size                           | 8GB                                                         | Shows the number of pages per disk file.
 sender_thread_batch_size               | 16                                                          | batch size of senders in datapump
 sender_thread_buffer_size              | 32                                                          | buffer size of senders in datapump
 sender_thread_num                      | 8                                                           | Number of maximum senders in datapump
 seq_page_cost                          | 1.2                                                         | Sets the planner's estimate of the cost of a sequentially fetched disk page.
 sequence_range                         | 1000                                                        | The range of values to ask from GTM for sequences. If CACHE parameter is set then that overrides this.
 server_encoding                        | SQL_ASCII                                                   | Sets the server (database) character set encoding.
 server_version                         | 10.0 TBase V2                                               | Shows the server version.
 server_version_num                     | 100000                                                      | Shows the server version as an integer.
 session_memory_size                    | 2                                                           | Used to get the total memory size of the session, in M Bytes.
 session_preload_libraries              |                                                             | Lists shared libraries to preload into each backend.
 session_replication_role               | origin                                                      | Sets the session's behavior for triggers and rewrite rules.
 set_global_snapshot                    | off                                                         | always use global snapshot for query
 shard_statistic_flush_interval         | 300                                                         | interval of flushing shard statistic.
 shard_visible_mode                     | visible                                                     | set the mode of shard visibility.
 shared_buffers                         | 20GB                                                        | Sets the number of shared memory buffers used by the server.
 shared_preload_libraries               |                                                             | Lists shared libraries to preload into server.
 shared_queue_size                      | 256kB                                                       | Sets the amount of memory allocated for a shared memory queue per datanode.
 shared_queues                          | 512                                                         | Sets the number of shared memory queues used by the distributed executor, minimum 1/4 of max_connections.
 show_all_shard_stat                    | off                                                         | show statistic information for all shards.
 skip_gtm_catalog                       | off                                                         | used to skip gtm catalog, WARNING:only for emergency purpose and only avaliable on coordinators.
 ssl                                    | off                                                         | Enables SSL connections.
 ssl_ca_file                            |                                                             | Location of the SSL certificate authority file.
 ssl_cert_file                          | server.crt                                                  | Location of the SSL server certificate file.
 ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL                                    | Sets the list of allowed SSL ciphers.
 ssl_crl_file                           |                                                             | Location of the SSL certificate revocation list file.
 ssl_dh_params_file                     |                                                             | Location of the SSL DH params file.
 ssl_ecdh_curve                         | prime256v1                                                  | Sets the curve to use for ECDH.
 ssl_key_file                           | server.key                                                  | Location of the SSL server private key file.
 ssl_prefer_server_ciphers              | on                                                          | Give priority to server ciphersuite order.
 standard_conforming_strings            | on                                                          | Causes '...' strings to treat backslashes literally.
 standby_plane_query_delay              | 0                                                           | delay time of query in standby.
 statement_timeout                      | 0                                                           | Sets the maximum allowed duration of any statement.
 stats_temp_directory                   | /tmp/pg_stat_tmp/xzDR3                                      | Writes temporary statistics files to the specified directory.
 sub_stat_hash_size                     | 65536                                                       | size of hashtable used in subscription statistics.
 sub_table_stat_hash_size               | 262144                                                      | size of hashtable used in subscription's table statistics.
 superuser_reserved_connections         | 4                                                           | Sets the number of connection slots reserved for superusers.
 support_oracle_compatible              | on                                                          | Support oracle compatibility.
 synchronize_seqscans                   | on                                                          | Enable synchronized sequential scans.
 synchronous_commit                     | off                                                         | Sets the current transaction's synchronization level.
 synchronous_standby_names              |                                                             | Number of synchronous standbys and list of names of potential synchronous ones.
 syslog_facility                        | local4                                                      | Sets the syslog "facility" to be used when syslog enabled.
 syslog_ident                           | dn                                                          | Sets the program name used to identify PostgreSQL messages in syslog.
 syslog_sequence_numbers                | on                                                          | Add sequence number to syslog messages to avoid duplicate suppression.
 syslog_split_messages                  | off                                                         | Split messages sent to syslog by lines and to fit into 1024 bytes.
 tcp_keepalives_count                   | 40                                                          | Maximum number of TCP keepalive retransmits.
 tcp_keepalives_idle                    | 300                                                         | Time between issuing TCP keepalives.
 tcp_keepalives_interval                | 45                                                          | Time between TCP keepalive retransmits.
 temp_buffers                           | 64MB                                                        | Sets the maximum number of temporary buffers used by each session.
 temp_cold_date                         |                                                             | Set the temp cold date, so the data import soon will be trade as cold data.
 temp_file_limit                        | -1                                                          | Limits the total size of all temporary files used by each process.
 temp_hot_date                          |                                                             | Set the temp hot date, when check data consistency in code datanode.
 temp_key_value                         |                                                             | Set the temp key value, so the white list tuple will be routed to new group.
 temp_tablespaces                       | temp_tablespace,temp_tablespace,ssd_tablespace              | Sets the tablespace(s) to use for temporary tables and sort files.
 TimeZone                               | Europe/Zurich                                               | Sets the time zone for displaying and interpreting time stamps.
 timezone_abbreviations                 | Default                                                     | Selects a file of time zone abbreviations.
 trace_extent                           | off                                                         | Emits information about extent changing.
 trace_notify                           | off                                                         | Generates debugging output for LISTEN and NOTIFY.
 trace_recovery_messages                | log                                                         | Enables logging of recovery-related debugging information.
 trace_sort                             | off                                                         | Emit information about resource usage in sorting.
 track_activities                       | on                                                          | Collects information about executing commands.
 track_activity_query_size              | 2048                                                        | Sets the size reserved for pg_stat_activity.query, in bytes.
 track_commit_timestamp                 | on                                                          | Collects transaction commit time.
 track_counts                           | on                                                          | Collects statistics on database activity.
 track_functions                        | none                                                        | Collects function-level statistics on database activity.
 track_io_timing                        | off                                                         | Collects timing statistics for database I/O activity.
 transaction_deferrable                 | off                                                         | Whether to defer a read-only serializable transaction until it can be executed with no possible serialization failures.
 transaction_isolation                  | repeatable read                                             | Sets the current transaction's isolation level.
 transaction_read_only                  | off                                                         | Sets the current transaction's read-only status.
 transform_insert_to_copy               | on                                                          | try to transform insert into multi-values to copy from.
 transform_null_equals                  | off                                                         | Treats "expr=NULL" as "expr IS NULL".
 unix_socket_directories                | /tmp/pg_stat_tmp/xzDR3                                      | Sets the directories where Unix-domain sockets will be created.
 unix_socket_group                      |                                                             | Sets the owning group of the Unix-domain socket.
 unix_socket_permissions                | 0777                                                        | Sets the access permissions of the Unix-domain socket.
 update_process_title                   | on                                                          | Updates the process title to show the active SQL command.
 use_data_pump                          | on                                                          | use datapump to make data transfer more efficient.
 vacuum_cost_delay                      | 0                                                           | Vacuum cost delay in milliseconds.
 vacuum_cost_limit                      | 200                                                         | Vacuum cost amount available before napping.
 vacuum_cost_page_dirty                 | 20                                                          | Vacuum cost for a page dirtied by vacuum.
 vacuum_cost_page_hit                   | 1                                                           | Vacuum cost for a page found in the buffer cache.
 vacuum_cost_page_miss                  | 10                                                          | Vacuum cost for a page not found in the buffer cache.
 vacuum_debug_print                     | off                                                         | vacuum debug print.
 vacuum_defer_cleanup_age               | 0                                                           | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any.
 vacuum_defer_freeze_min_age            | 10000                                                       | Minimum age at which VACUUM should defer to freeze a table row to avoid failure due to too old timestamp.
 vacuum_delta                           | 10000                                                       | The time period before which VACUUM and HOT cleanup could clean the transactions, if any.
 vacuum_freeze_min_age                  | 150000000                                                   | Minimum age at which VACUUM should freeze a table row.
 vacuum_freeze_table_age                | 1800000000                                                  | Age at which VACUUM should scan whole table to freeze tuples.
 vacuum_multixact_freeze_min_age        | 5000000                                                     | Minimum age at which VACUUM should freeze a MultiXactId in a table row.
 vacuum_multixact_freeze_table_age      | 150000000                                                   | Multixact age at which VACUUM should scan whole table to freeze tuples.
 wal_block_size                         | 16384                                                       | Shows the block size in the write ahead log.
 wal_buffers                            | 64MB                                                        | Sets the number of disk-page buffers in shared memory for WAL.
 wal_check                              | off                                                         | check consistency of wal.
 wal_compression                        | on                                                          | Compresses full-page writes written in WAL file.
 wal_consistency_checking               |                                                             | Sets the WAL resource managers for which WAL consistency checks are done.
 wal_keep_segments                      | 0                                                           | Sets the number of WAL files held for standby servers.
 wal_level                              | logical                                                     | Set the level of information written to the WAL.
 wal_log_hints                          | off                                                         | Writes full pages to WAL when first modified after a checkpoint, even for a non-critical modifications.
 wal_receiver_status_interval           | 10s                                                         | Sets the maximum interval between WAL receiver status reports to the primary.
 wal_receiver_timeout                   | 1min                                                        | Sets the maximum wait time to receive data from the primary.
 wal_retrieve_retry_interval            | 5s                                                          | Sets the time to wait before retrying to retrieve WAL after a failed attempt.
 wal_segment_size                       | 64MB                                                        | Shows the number of pages per write ahead log segment.
 wal_sender_timeout                     | 1min                                                        | Sets the maximum time to wait for WAL replication.
 wal_stream_type                        | user_stream                                                 | set wal_stream_type type, enum as user_stream/cluster_stream/internal_stream, default is user_stream string
 wal_sync_method                        | fdatasync                                                   | Selects the method used for forcing WAL updates to disk.
 wal_track_entry_number                 | 256ms                                                       | Number of entries to track GTS of each xlog segment.
 wal_writer_delay                       | 500ms                                                       | Time between WAL flushes performed in the WAL writer.
 wal_writer_flush_after                 | 1MB                                                         | Amount of WAL written out by WAL writer that triggers a flush.
 warm_shared_buffer                     | off                                                         | Set connection to warm shared buffer.
 work_mem                               | 2MB                                                         | Sets the maximum memory to be used for query workspaces.
 xc_maintenance_mode                    | off                                                         | Turn on XC maintenance mode.
 xmlbinary                              | base64                                                      | Sets how binary values are to be encoded in XML.
 xmloption                              | content                                                     | Sets whether XML data in implicit parsing and serialization operations is to be considered as documents or content fragments.
 zero_damaged_pages                     | off                                                         | Continues processing past damaged page headers.

and coords:

                  name                  |                           setting                           |                                                          description                                                          
----------------------------------------+-------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------
 account_lock_threshold                 | 10                                                          | Account will be locked after failed  account_lock_threshold times
 account_lock_time                      | 10min                                                       | The time which user account will be locked.
 account_lock_track_count               | 1000                                                        | The limit to store user account info. aka the max num of hash table entry.
 allow_dml_on_datanode                  | off                                                         | allow insert/update/delete directly on datanode
 allow_force_ddl                        | off                                                         | allow forced ddl of inconsistent metadata
 allow_system_table_mods                | off                                                         | Allows modifications of the structure of system tables.
 alog_common_cache_size                 | 64kB                                                        | Size of common audit log local buffer for each audit worker, kilobytes.
 alog_common_queue_size                 | 64kB                                                        | Size of share memory queue for each backend to store common audit log, kilobytes.
 alog_fga_cacae_size                    | 64kB                                                        | Size of fga audit log local buffer for each audit worker, kilobytes.
 alog_fga_queue_size                    | 64kB                                                        | Size of share memory queue for each backend to store fga audit log, kilobytes.
 alog_file_mode                         | 0600                                                        | Sets the file permissions for audit log files.
 alog_filename                          | audit-%A-%H.log                                             | Sets the file name pattern for audit log files.
 alog_max_worker_number                 | 16                                                          | Max number of worker thead to read audit log from shared memory queue.
 alog_rotation_age                      | 1d                                                          | Automatic audit log file rotation will occur after N minutes.
 alog_rotation_size                     | 10MB                                                        | Automatic audit log file rotation will occur after N kilobytes.
 alog_truncate_on_rotation              | off                                                         | Truncate existing log files of same name during log rotation.
 application_name                       | psql                                                        | Sets the application name to be reported in statistics and logs.
 archive_autowake_interval              | 30                                                          | how often to force a poll of the archive status directory in seconds.
 archive_command                        | (disabled)                                                  | Sets the shell command that will be called to archive a WAL file.
 archive_mode                           | off                                                         | Allows archiving of WAL files using archive_command.
 archive_status_control                 | continue                                                    | Control of break or continue archive xlog. 
 archive_timeout                        | 0                                                           | Forces a switch to the next WAL file if a new file has not been started within N seconds.
 array_nulls                            | on                                                          | Enable input of NULL elements in arrays.
 authentication_timeout                 | 1min                                                        | Sets the maximum allowed time to complete client authentication.
 autovacuum                             | on                                                          | Starts the autovacuum subprocess.
 autovacuum_analyze_scale_factor        | 0.02                                                        | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples.
 autovacuum_analyze_threshold           | 150000                                                      | Minimum number of tuple inserts, updates, or deletes prior to analyze.
 autovacuum_freeze_max_age              | 2000000000                                                  | Age at which to autovacuum a table to prevent transaction ID wraparound.
 autovacuum_max_workers                 | 5                                                           | Sets the maximum number of simultaneously running autovacuum worker processes.
 autovacuum_multixact_freeze_max_age    | 400000000                                                   | Multixact age at which to autovacuum a table to prevent multixact wraparound.
 autovacuum_naptime                     | 30min                                                       | Time to sleep between autovacuum runs.
 autovacuum_vacuum_cost_delay           | 20ms                                                        | Vacuum cost delay in milliseconds, for autovacuum.
 autovacuum_vacuum_cost_limit           | -1                                                          | Vacuum cost amount available before napping, for autovacuum.
 autovacuum_vacuum_scale_factor         | 0.03                                                        | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
 autovacuum_vacuum_threshold            | 150000                                                      | Minimum number of tuple updates or deletes prior to vacuum.
 autovacuum_work_mem                    | -1                                                          | Sets the maximum memory to be used by each autovacuum worker process.
 backend_flush_after                    | 0                                                           | Number of pages after which previously performed writes are flushed to disk.
 backslash_quote                        | safe_encoding                                               | Sets whether "\'" is allowed in string literals.
 base_backup_limit                      | 50                                                          | basebackup speed limit.
 bgwriter_delay                         | 250ms                                                       | Background writer sleep time between rounds.
 bgwriter_flush_after                   | 1MB                                                         | Number of pages after which previously performed writes are flushed to disk.
 bgwriter_lru_maxpages                  | 1000                                                        | Background writer maximum number of LRU pages to flush per round.
 bgwriter_lru_multiplier                | 2                                                           | Multiple of the average buffer usage to free per round.
 block_size                             | 16384                                                       | Shows the size of a disk block.
 bonjour                                | off                                                         | Enables advertising the server via Bonjour.
 bonjour_name                           |                                                             | Sets the Bonjour service name.
 bytea_output                           | hex                                                         | Sets the output format for bytea.
 check_function_bodies                  | on                                                          | Check function bodies during CREATE FUNCTION.
 checkpoint_completion_target           | 0.9                                                         | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
 checkpoint_crypt_queue_length          | 32                                                          | length of crypt queue between checkpoint main and crypt workers.
 checkpoint_crypt_worker                | 4                                                           | number of workers for parellel crypt.
 checkpoint_flush_after                 | 512kB                                                       | Number of pages after which previously performed writes are flushed to disk.
 checkpoint_timeout                     | 30min                                                       | Sets the maximum time between automatic WAL checkpoints.
 checkpoint_warning                     | 30s                                                         | Enables warnings if checkpoint segments are filled more frequently than this.
 client_encoding                        | SQL_ASCII                                                   | Sets the client's character set encoding.
 client_min_messages                    | notice                                                      | Sets the message levels that are sent to the client.
 cluster_name                           |                                                             | Sets the name of the cluster, which is included in the process title.
 cold_hot_sepration_mode                | month                                                       | Set the cold_hot_sepration_mode, day/month/year.
 commit_delay                           | 50                                                          | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
 commit_siblings                        | 5                                                           | Sets the minimum concurrent open transactions before performing commit_delay.
 config_file                            | /ssd/xzDR3/coord/postgresql.conf                            | Sets the server's main configuration file.
 constraint_exclusion                   | partition                                                   | Enables the planner to use constraints to optimize queries.
 consumer_connect_timeout               | 60                                                          | timeout to comsumer connect to producer
 cpu_index_tuple_cost                   | 0.002                                                       | Sets the planner's estimate of the cost of processing each index entry during an index scan.
 cpu_operator_cost                      | 0.0015                                                      | Sets the planner's estimate of the cost of processing each operator or function call.
 cpu_tuple_cost                         | 0.05                                                        | Sets the planner's estimate of the cost of processing each tuple (row).
 create_key_value_mode                  | 0                                                           | To control key value created on nodes range, 0:all, 1:cn only, 2:dn only
 cursor_tuple_fraction                  | 0.1                                                         | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved.
 data_checksums                         | off                                                         | Shows whether data checksums are turned on for this cluster.
 data_directory                         | /ssd/xzDR3/coord                                            | Sets the server's data directory.
 datarow_buffer_size                    | 32                                                          | buffer size for prefetch.
 DateStyle                              | ISO, MDY                                                    | Sets the display format for date and time values.
 db_user_namespace                      | off                                                         | Enables per-database user names.
 deadlock_timeout                       | 1s                                                          | Sets the time to wait on a lock before checking for deadlock.
 debug_assertions                       | off                                                         | Shows whether the running server has assertion checks enabled.
 debug_data_pump                        | off                                                         | enable debug to trace data pump.
 debug_pretty_print                     | on                                                          | Indents parse and plan tree displays.
 debug_print_parse                      | off                                                         | Logs each query's parse tree.
 debug_print_plan                       | off                                                         | Logs each query's execution plan.
 debug_print_rewritten                  | off                                                         | Logs each query's rewritten parse tree.
 default_hashagg_nbatches               | 32                                                          | number of batch files in hybrid-hash agg.
 default_locator_type                   | replication                                                 | set default table locator type, enum as shard/hash/replication, default is empty string
 default_statistics_target              | 10000                                                       | Sets the default statistics target.
 default_tablespace                     |                                                             | Sets the default tablespace to create tables and indexes in.
 default_text_search_config             | pg_catalog.english                                          | Sets default text search configuration.
 default_transaction_deferrable         | off                                                         | Sets the default deferrable status of new transactions.
 default_transaction_isolation          | read committed                                              | Sets the transaction isolation level of each new transaction.
 default_transaction_read_only          | off                                                         | Sets the default read-only status of new transactions.
 default_with_oids                      | off                                                         | Create new tables with OIDs by default.
 delay_after_acquire_committs           | 0                                                           | Delay transaction commit after acquiring commmitts (us).
 delay_before_acquire_committs          | 0                                                           | Delay transaction commit before acquiring commmitts (us).
 dis_consumer_timeout                   | 60                                                          | timeout to vacuum disconnected comsumer's entry
 distributed_query_analyze              | off                                                         | enable collecting distributed query info.
 dynamic_library_path                   | $libdir                                                     | Sets the path for dynamically loadable modules.
 dynamic_shared_memory_type             | posix                                                       | Selects the dynamic shared memory implementation used.
 effective_cache_size                   | 100GB                                                       | Sets the planner's assumption about the size of the disk cache.
 effective_io_concurrency               | 5                                                           | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
 enable_audit                           | off                                                         | Enable to audit user operations on the database objects.
 enable_audit_warning                   | off                                                         | Enable to print audit warning logs.
 enable_auditlogger_warning             | off                                                         | Enable to write audit logger process warnings.
 enable_bitmapscan                      | on                                                          | Enables the planner's use of bitmap-scan plans.
 enable_check_password                  | off                                                         | Enable password check.
 enable_cls                             | off                                                         | Enable to Cube Lable Security check.
 enable_cold_hot_visible                | off                                                         | Enable cold-hot visible no matter on cold/hot datanode.
 enable_cold_seperation                 | off                                                         | Enable cold storage seperation when make route strategy.
 enable_committs_print                  | off                                                         | enable commit ts debug print
 enable_concurrently_index              | on                                                          | enable create index concurrently.
 enable_copy_silence                    | off                                                         | Enable copy from silent when error happens.
 enable_crypt_check                     | off                                                         | Enable check crypt consistency, the crypted context could be decrypted, and the value is the same as original.
 enable_crypt_debug                     | off                                                         | Enable debug for crypt feature.
 enable_crypt_parellel_debug            | off                                                         | Enable trace buffer crypt procedure.
 enable_data_mask                       | on                                                          | Enable to datamask feature.
 enable_datanode_row_triggers           | on                                                          | Enables datanode-only ROW triggers
 enable_distri_debug                    | off                                                         | enable distributed transaction debug
 enable_distri_debug_print              | off                                                         | enable distributed transaction debug print
 enable_distri_visibility_print         | off                                                         | enable distributed transaction visibility print
 enable_distributed_unique_plan         | on                                                          | enable distributed unique plan.
 enable_fast_query_shipping             | on                                                          | Enables the planner's use of fast query shipping to ship query directly to datanode.
 enable_fga                             | off                                                         | Enable Fine-grained audit.
 enable_gathermerge                     | on                                                          | Enables the planner's use of gather merge plans.
 enable_group_across_query              | on                                                          | enable group-across queries.
 enable_gtm_debug_print                 | off                                                         | enable gtm debug infomation print
 enable_gtm_proxy                       | off                                                         | Enable gtm proxy a child process of Postmaster.
 enable_hashagg                         | on                                                          | Enables the planner's use of hashed aggregation plans.
 enable_hashjoin                        | on                                                          | Enables the planner's use of hash join plans.
 enable_indexonlyscan                   | on                                                          | Enables the planner's use of index-only-scan plans.
 enable_indexscan                       | on                                                          | Enables the planner's use of index-scan plans.
 enable_key_value                       | off                                                         | Enable key value lookup when make route strategy.
 enable_lock_account                    | off                                                         | Enable lock account when login fail serval times.
 enable_material                        | on                                                          | Enables the planner's use of materialization.
 enable_mergejoin                       | on                                                          | Enables the planner's use of merge join plans.
 enable_multi_cluster                   | off                                                         | Enable multiple clusters.
 enable_multi_cluster_print             | off                                                         | Enable multiple cluster print.
 enable_nestloop                        | on                                                          | Enables the planner's use of nested-loop join plans.
 enable_null_string                     | off                                                         | enable nulls in string.
 enable_oracle_compatible               | off                                                         | Enable oracle compatibility.
 enable_pgbouncer                       | off                                                         | use pgbouncer as coordinator connection pool.
 enable_plpgsql_debug_print             | off                                                         | enable plpgsql debug infomation print
 enable_pooler_debug_print              | off                                                         | enable pooler manager debug infomation print
 enable_pooler_stuck_exit               | off                                                         | enable pooler exit when pick up sync network thread failed
 enable_pullup_subquery                 | off                                                         | pullup subquery to make execution more efficient.
 enable_replication_slot_debug          | off                                                         | enable_replication_slot_debug
 enable_sampling_analyze                | off                                                         | sampling rows from datanodes when doing analyze on coordinator.
 enable_seqscan                         | on                                                          | Enables the planner's use of sequential-scan plans.
 enable_shard_statistic                 | on                                                          | collect statistic information for shard.
 enable_sort                            | on                                                          | Enables the planner's use of explicit sort steps.
 enable_statistic                       | off                                                         | collect statistic information for debug.
 enable_tidscan                         | on                                                          | Enables the planner's use of TID scan plans.
 enable_transparent_crypt               | on                                                          | Enable to transparent crypt feature.
 enable_user_authority_force_check      | off                                                         | control users to get the list of tables and functions which can be accessed and executed by these user.
 escape_string_warning                  | on                                                          | Warn about backslash escapes in ordinary string literals.
 event_source                           | PostgreSQL                                                  | Sets the application name used to identify PostgreSQL messages in the event log.
 exit_on_error                          | off                                                         | Terminate session on any error.
 external_pid_file                      |                                                             | Writes the postmaster PID to the specified file.
 extra_float_digits                     | 3                                                           | Sets the number of digits displayed for floating-point values.
 force_parallel_mode                    | off                                                         | Forces use of parallel query facilities.
 from_collapse_limit                    | 8                                                           | Sets the FROM-list size beyond which subqueries are not collapsed.
 fsync                                  | on                                                          | Forces synchronization of updates to disk.
 full_page_writes                       | on                                                          | Writes full pages to WAL when first modified after a checkpoint.
 geqo                                   | on                                                          | Enables genetic query optimization.
 geqo_effort                            | 5                                                           | GEQO: effort is used to set the default for other GEQO parameters.
 geqo_generations                       | 0                                                           | GEQO: number of iterations of the algorithm.
 geqo_pool_size                         | 0                                                           | GEQO: number of individuals in the population.
 geqo_seed                              | 0                                                           | GEQO: seed for random path selection.
 geqo_selection_bias                    | 2                                                           | GEQO: selective pressure within the population.
 geqo_threshold                         | 12                                                          | Sets the threshold of FROM items beyond which GEQO is used.
 gin_fuzzy_search_limit                 | 0                                                           | Sets the maximum allowed result for exact search by GIN.
 gin_pending_list_limit                 | 4MB                                                         | Sets the maximum size of the pending list for GIN index.
 global_snapshot_source                 | gtm                                                         | Set preferred source of a snapshot.
 gtm_backup_barrier                     | off                                                         | Enables coordinator to report barrier id to GTM for backup.
 gts_acquire_gap                        | 30s                                                         | Time gap between global timestamp acquirement in the WAL writer.
 hba_file                               | /ssd/xzDR3/coord/pg_hba.conf                                | Sets the server's "hba" configuration file.
 hot_data_age                           | 3650000                                                     | age of hot data, data older than this age will be turned cold.
 hot_standby                            | on                                                          | Allows connections and queries during recovery.
 hot_standby_feedback                   | off                                                         | Allows feedback from a hot standby to the primary that will avoid query conflicts.
 huge_pages                             | off                                                         | Use of huge pages on Linux.
 hybrid_hash_agg                        | off                                                         | enable hybrid-hash agg.
 hybrid_hash_agg_debug                  | off                                                         | enable hybrid-hash agg debug.
 ident_file                             | /ssd/xzDR3/coord/pg_ident.conf                              | Sets the server's "ident" configuration file.
 idle_in_transaction_session_timeout    | 0                                                           | Sets the maximum allowed duration of any idling transaction.
 ignore_checksum_failure                | off                                                         | Continues processing after a checksum failure.
 ignore_system_indexes                  | off                                                         | Disables reading from system indexes.
 init_pool_size                         | 10                                                          | Initial pool size.
 integer_datetimes                      | on                                                          | Datetimes are integer based.
 interval_sample_rate                   | 20                                                          | sample rate of interval partition.
 interval_sample_threshold              | 100                                                         | sample threshold of interval partition.
 IntervalStyle                          | postgres                                                    | Sets the display format for interval values.
 join_collapse_limit                    | 8                                                           | Sets the FROM-list size beyond which JOIN constructs are not flattened.
 krb_caseins_users                      | off                                                         | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive.
 krb_server_keyfile                     |                                                             | Sets the location of the Kerberos server key file.
 lc_collate                             | C                                                           | Shows the collation order locale.
 lc_ctype                               | C                                                           | Shows the character classification and case conversion locale.
 lc_messages                            | C                                                           | Sets the language in which messages are displayed.
 lc_monetary                            | C                                                           | Sets the locale for formatting monetary amounts.
 lc_numeric                             | C                                                           | Sets the locale for formatting numbers.
 lc_time                                | C                                                           | Sets the locale for formatting date and time values.
 listen_addresses                       | *                                                           | Sets the host name or IP address(es) to listen to.
 lo_compat_privileges                   | off                                                         | Enables backward compatibility mode for privilege checks on large objects.
 local_preload_libraries                |                                                             | Lists unprivileged shared libraries to preload into each backend.
 lock_account_print                     | off                                                         | Enable print log in lock account procedure.
 lock_timeout                           | 0                                                           | Sets the maximum allowed duration of any wait for a lock.
 log_autovacuum_min_duration            | 10s                                                         | Sets the minimum execution time above which autovacuum actions will be logged.
 log_checkpoints                        | on                                                          | Logs each checkpoint.
 log_connections                        | off                                                         | Logs each successful connection.
 log_destination                        | syslog                                                      | Sets the destination for server log output.
 log_directory                          | /ssd/xzDR3/log                                              | Sets the destination directory for log files.
 log_disconnections                     | off                                                         | Logs end of a session, including duration.
 log_duration                           | off                                                         | Logs the duration of each completed SQL statement.
 log_error_verbosity                    | verbose                                                     | Sets the verbosity of logged messages.
 log_executor_stats                     | off                                                         | Writes executor performance statistics to the server log.
 log_file_mode                          | 0600                                                        | Sets the file permissions for log files.
 log_filename                           | coord-%a.log                                                | Sets the file name pattern for log files.
 log_gtm_stats                          | off                                                         | Writes GTM performance statistics to the server log.
 log_hostname                           | off                                                         | Logs the host name in the connection logs.
 log_line_prefix                        | %t [%p]:xid[%x-%v] [%l-1] user=%u,db=%d,client=%h,query=%a, | Controls information prefixed to each log line.
 log_lock_waits                         | on                                                          | Logs long lock waits.
 log_min_duration_statement             | 5s                                                          | Sets the minimum execution time above which statements will be logged.
 log_min_error_statement                | debug5                                                      | Causes all statements generating error at or above this level to be logged.
 log_min_messages                       | info                                                        | Sets the message levels that are logged.
 log_parser_stats                       | off                                                         | Writes parser performance statistics to the server log.
 log_planner_stats                      | off                                                         | Writes planner performance statistics to the server log.
 log_remotesubplan_stats                | off                                                         | Writes remote subplan performance statistics to the server log.
 log_replication_commands               | off                                                         | Logs each replication command.
 log_rotation_age                       | 7d                                                          | Automatic log file rotation will occur after N minutes.
 log_rotation_size                      | 0                                                           | Automatic log file rotation will occur after N kilobytes.
 log_statement                          | none                                                        | Sets the type of statements logged.
 log_statement_stats                    | off                                                         | Writes cumulative performance statistics to the server log.
 log_temp_files                         | 1000000kB                                                   | Log the use of temporary files larger than this number of kilobytes.
 log_timezone                           | Europe/Vaduz                                                | Sets the time zone to use in log messages.
 log_truncate_on_rotation               | on                                                          | Truncate existing log files of same name during log rotation.
 logging_collector                      | on                                                          | Start a subprocess to capture stderr output and/or csvlogs into log files.
 loose_constraints                      | off                                                         | Relax enforcing of constraints
 loose_unique_index                     | on                                                          | Relax enforcing of unique index
 maintenance_work_mem                   | 4GB                                                         | Sets the maximum memory to be used for maintenance operations.
 manual_hot_date                        |                                                             | Set the manual_hot_date, this value will mask off hot_date_age.
 max_connections                        | 800                                                         | Sets the maximum number of concurrent connections.
 max_files_per_process                  | 1000                                                        | Sets the maximum number of simultaneously open files for each server process.
 max_function_args                      | 100                                                         | Shows the maximum number of function arguments.
 max_identifier_length                  | 63                                                          | Shows the maximum identifier length.
 max_index_keys                         | 32                                                          | Shows the maximum number of index keys.
 max_locks_per_transaction              | 4000                                                        | Sets the maximum number of locks per transaction.
 max_logical_replication_workers        | 64                                                          | Maximum number of logical replication worker processes.
 max_network_bandwidth_per_subscription | 50                                                          | Maximum network bandwidth per subscription.
 max_parallel_workers                   | 40                                                          | Sets the maximum number of parallel workers than can be active at one time.
 max_parallel_workers_per_gather        | 20                                                          | Sets the maximum number of parallel processes per executor node.
 max_pool_size                          | 14000                                                       | Max pool size.
 max_pred_locks_per_page                | 2                                                           | Sets the maximum number of predicate-locked tuples per page.
 max_pred_locks_per_relation            | -2                                                          | Sets the maximum number of predicate-locked pages and tuples per relation.
 max_pred_locks_per_transaction         | 64                                                          | Sets the maximum number of predicate locks per transaction.
 max_prepared_transactions              | 4000                                                        | Sets the maximum number of simultaneously prepared transactions.
 max_replication_slots                  | 64                                                          | Sets the maximum number of simultaneously defined replication slots.
 max_sessions_per_shardpool             | 100                                                         | max number of sessions for each shard pool.
 max_stack_depth                        | 2MB                                                         | Sets the maximum stack depth, in kilobytes.
 max_standby_archive_delay              | 30s                                                         | Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data.
 max_standby_streaming_delay            | 30s                                                         | Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data.
 max_sync_workers_per_subscription      | 4                                                           | Maximum number of table synchronization workers per subscription.
 max_wal_senders                        | 64                                                          | Sets the maximum number of simultaneously running WAL sender processes.
 max_wal_size                           | 150GB                                                       | Sets the WAL size that triggers a checkpoint.
 max_worker_processes                   | 128                                                         | Maximum number of concurrent worker processes.
 min_free_size                          | 5                                                           | minimal pool free connection number.
 min_parallel_index_scan_size           | 512kB                                                       | Sets the minimum amount of index data for a parallel scan.
 min_parallel_table_scan_size           | 8MB                                                         | Sets the minimum amount of table data for a parallel scan.
 min_pool_size                          | 5                                                           | Min pool size.
 min_wal_size                           | 320MB                                                       | Sets the minimum size to shrink the WAL to.
 network_byte_cost                      | 0.001                                                       | Sets the planner's estimate of the cost of sending data from remote node.
 nls_date_format                        | YYYY-MM-DD HH24:MI:SS                                       | Emulate oracle's date output behaviour.
 nls_sort_locale                        |                                                             | COLLATE set for nlssort.
 nls_timestamp_format                   | YYYY-MM-DD HH24:MI:SS.US                                    | Emulate oracle's timestamp without time zone output behaviour.
 nls_timestamp_tz_format                | YYYY-MM-DD HH24:MI:SS.US TZ                                 | Emulate oracle's timestamp with time zone output behaviour.
 node_is_extension                      | off                                                         | if this datanode is extension
 olap_optimizer                         | on                                                          | enable OLAP optimizer to make Query more efficient.
 old_snapshot_threshold                 | -1                                                          | Time before a snapshot is too old to read pages changed after the snapshot was taken.
 operator_precedence_warning            | off                                                         | Emit a warning for constructs that changed meaning since PostgreSQL 9.4.
 page_ts_need_xlog                      | 1                                                           | Determine whether the page timestamp should be logged.
 parallel_setup_cost                    | 1000                                                        | Sets the planner's estimate of the cost of starting up worker processes for parallel query.
 parallel_tuple_cost                    | 0.1                                                         | Sets the planner's estimate of the cost of passing each tuple (row) from worker to master backend.
 param_pass_down                        | off                                                         | enable exec param to pass down.
 parentnode                             |                                                             | Sets the name of the parent data node
 password_encryption                    | md5                                                         | Encrypt passwords.
 persistent_datanode_connections        | off                                                         | Session never releases acquired connections.
 pgbouncer_conf                         | pgbouncer.ini                                               | pgbouncer config file name, in PGDATA dir.
 pgxc_cluster_name                      | tbase_cluster                                               | The Cluster name.
 pgxc_main_cluster_name                 | tbase_cluster                                               | The Main Cluster name.
 pgxc_node_name                         | coord1                                                      | The Coordinator or Datanode name.
 pgxl_remote_fetch_size                 | 20000                                                       | Number of maximum tuples to fetch in one remote iteration
 pool_check_slot_timeout                | 5s                                                          | Enable pooler check slot. When slot is using by agent, shouldn't exist in nodepool.
 pool_conn_keepalive                    | 1min                                                        | Close connections if they are idle in the pool for that time.
 pool_maintenance_timeout               | 10s                                                         | Launch maintenance routine if pooler idle for that time.
 pool_print_stat_timeout                | -1                                                          | Enable pooler print stat info.
 pool_session_context_check_gap         | 2min                                                        | Gap to check datanode session memory context.
 pool_session_max_lifetime              | 5min                                                        | Datanode session max lifetime.
 pool_session_memory_limit              | 20s                                                         | Datanode session max memory context size.
 pooler_connect_timeout                 | 10s                                                         | Pooler connection timeout.
 pooler_dn_set_timeout                  | 1min                                                        | Pooler datanode set query timeout.
 pooler_port                            | 50013                                                       | Port of the Pool Manager.
 pooler_scale_factor                    | 2                                                           | Pooler scale factor.
 pooler_unpooled_database               | template1                                                   | databases that pooler will not pool its connections.
 pooler_unpooled_user                   | mls_admin                                                   | users that pooler will not pool its connections.
 pooler_warm_db_user                    |                                                             | pooler warm datanode database and user.
 port                                   | 55431                                                       | Sets the TCP port the server listens on.
 post_auth_delay                        | 0                                                           | Waits N seconds on connection startup after authentication.
 pre_auth_delay                         | 0                                                           | Waits N seconds on connection startup before authentication.
 prefer_olap                            | off                                                         | Prefer to run OLAP.
 pub_stat_hash_size                     | 65536                                                       | size of hashtable used in publication statistics.
 pub_table_stat_hash_size               | 262144                                                      | size of hashtable used in publication's table statistics.
 quote_all_identifiers                  | off                                                         | When generating SQL fragments, quote all identifiers.
 random_collect_stats                   | on                                                          | Coordinator random collects the statistics from all data nodes.
 random_page_cost                       | 1.21                                                        | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
 remote_query_cost                      | 100                                                         | Sets the planner's estimate of the cost of setting up remote subquery.
 remotetype                             | application                                                 | Sets the type of Postgres-XL remote connection
 replacement_sort_tuples                | 150000                                                      | Sets the maximum number of tuples to be sorted using replacement selection.
 replication_level                      | 1                                                           | replication level on join to make Query more efficient.
 restart_after_crash                    | on                                                          | Reinitialize server after backend crash.
 restrict_query                         | on                                                          | restrict query to involved node as possible
 row_security                           | on                                                          | Enable row security.
 search_path                            | dr3_ops_cs36, dr3_ops_cs36_part, public                     | Sets the schema search order for names that are not schema-qualified.
 segment_size                           | 8GB                                                         | Shows the number of pages per disk file.
 sender_thread_batch_size               | 16                                                          | batch size of senders in datapump
 sender_thread_buffer_size              | 32                                                          | buffer size of senders in datapump
 sender_thread_num                      | 8                                                           | Number of maximum senders in datapump
 seq_page_cost                          | 1.2                                                         | Sets the planner's estimate of the cost of a sequentially fetched disk page.
 sequence_range                         | 1000                                                        | The range of values to ask from GTM for sequences. If CACHE parameter is set then that overrides this.
 server_encoding                        | SQL_ASCII                                                   | Sets the server (database) character set encoding.
 server_version                         | 10.0 TBase V2                                               | Shows the server version.
 server_version_num                     | 100000                                                      | Shows the server version as an integer.
 session_memory_size                    | 2                                                           | Used to get the total memory size of the session, in M Bytes.
 session_preload_libraries              |                                                             | Lists shared libraries to preload into each backend.
 session_replication_role               | origin                                                      | Sets the session's behavior for triggers and rewrite rules.
 set_global_snapshot                    | off                                                         | always use global snapshot for query
 shard_statistic_flush_interval         | 300                                                         | interval of flushing shard statistic.
 shard_visible_mode                     | visible                                                     | set the mode of shard visibility.
 shared_buffers                         | 4GB                                                         | Sets the number of shared memory buffers used by the server.
 shared_preload_libraries               |                                                             | Lists shared libraries to preload into server.
 shared_queue_size                      | 256kB                                                       | Sets the amount of memory allocated for a shared memory queue per datanode.
 shared_queues                          | 8000                                                        | Sets the number of shared memory queues used by the distributed executor, minimum 1/4 of max_connections.
 show_all_shard_stat                    | off                                                         | show statistic information for all shards.
 skip_gtm_catalog                       | off                                                         | used to skip gtm catalog, WARNING:only for emergency purpose and only avaliable on coordinators.
 ssl                                    | off                                                         | Enables SSL connections.
 ssl_ca_file                            |                                                             | Location of the SSL certificate authority file.
 ssl_cert_file                          | server.crt                                                  | Location of the SSL server certificate file.
 ssl_ciphers                            | HIGH:MEDIUM:+3DES:!aNULL                                    | Sets the list of allowed SSL ciphers.
 ssl_crl_file                           |                                                             | Location of the SSL certificate revocation list file.
 ssl_dh_params_file                     |                                                             | Location of the SSL DH params file.
 ssl_ecdh_curve                         | prime256v1                                                  | Sets the curve to use for ECDH.
 ssl_key_file                           | server.key                                                  | Location of the SSL server private key file.
 ssl_prefer_server_ciphers              | on                                                          | Give priority to server ciphersuite order.
 standard_conforming_strings            | on                                                          | Causes '...' strings to treat backslashes literally.
 standby_plane_query_delay              | 0                                                           | delay time of query in standby.
 statement_timeout                      | 0                                                           | Sets the maximum allowed duration of any statement.
 stats_temp_directory                   | /tmp/pg_stat_tmp/xzDR3                                      | Writes temporary statistics files to the specified directory.
 sub_stat_hash_size                     | 65536                                                       | size of hashtable used in subscription statistics.
 sub_table_stat_hash_size               | 262144                                                      | size of hashtable used in subscription's table statistics.
 superuser_reserved_connections         | 4                                                           | Sets the number of connection slots reserved for superusers.
 support_oracle_compatible              | on                                                          | Support oracle compatibility.
 synchronize_seqscans                   | on                                                          | Enable synchronized sequential scans.
 synchronous_commit                     | off                                                         | Sets the current transaction's synchronization level.
 synchronous_standby_names              |                                                             | Number of synchronous standbys and list of names of potential synchronous ones.
 syslog_facility                        | local4                                                      | Sets the syslog "facility" to be used when syslog enabled.
 syslog_ident                           | coord                                                       | Sets the program name used to identify PostgreSQL messages in syslog.
 syslog_sequence_numbers                | on                                                          | Add sequence number to syslog messages to avoid duplicate suppression.
 syslog_split_messages                  | off                                                         | Split messages sent to syslog by lines and to fit into 1024 bytes.
 tcp_keepalives_count                   | 40                                                          | Maximum number of TCP keepalive retransmits.
 tcp_keepalives_idle                    | 300                                                         | Time between issuing TCP keepalives.
 tcp_keepalives_interval                | 45                                                          | Time between TCP keepalive retransmits.
 temp_buffers                           | 64MB                                                        | Sets the maximum number of temporary buffers used by each session.
 temp_cold_date                         |                                                             | Set the temp cold date, so the data import soon will be trade as cold data.
 temp_file_limit                        | -1                                                          | Limits the total size of all temporary files used by each process.
 temp_hot_date                          |                                                             | Set the temp hot date, when check data consistency in code datanode.
 temp_key_value                         |                                                             | Set the temp key value, so the white list tuple will be routed to new group.
 temp_tablespaces                       | temp_tablespace,temp_tablespace,ssd_tablespace              | Sets the tablespace(s) to use for temporary tables and sort files.
 TimeZone                               | Europe/Zurich                                               | Sets the time zone for displaying and interpreting time stamps.
 timezone_abbreviations                 | Default                                                     | Selects a file of time zone abbreviations.
 trace_extent                           | off                                                         | Emits information about extent changing.
 trace_notify                           | off                                                         | Generates debugging output for LISTEN and NOTIFY.
 trace_recovery_messages                | log                                                         | Enables logging of recovery-related debugging information.
 trace_sort                             | off                                                         | Emit information about resource usage in sorting.
 track_activities                       | on                                                          | Collects information about executing commands.
 track_activity_query_size              | 2048                                                        | Sets the size reserved for pg_stat_activity.query, in bytes.
 track_commit_timestamp                 | on                                                          | Collects transaction commit time.
 track_counts                           | on                                                          | Collects statistics on database activity.
 track_functions                        | none                                                        | Collects function-level statistics on database activity.
 track_io_timing                        | off                                                         | Collects timing statistics for database I/O activity.
 transaction_deferrable                 | off                                                         | Whether to defer a read-only serializable transaction until it can be executed with no possible serialization failures.
 transaction_isolation                  | read committed                                              | Sets the current transaction's isolation level.
 transaction_read_only                  | off                                                         | Sets the current transaction's read-only status.
 transform_insert_to_copy               | on                                                          | try to transform insert into multi-values to copy from.
 transform_null_equals                  | off                                                         | Treats "expr=NULL" as "expr IS NULL".
 unix_socket_directories                | /tmp/pg_stat_tmp/xzDR3                                      | Sets the directories where Unix-domain sockets will be created.
 unix_socket_group                      |                                                             | Sets the owning group of the Unix-domain socket.
 unix_socket_permissions                | 0777                                                        | Sets the access permissions of the Unix-domain socket.
 update_process_title                   | on                                                          | Updates the process title to show the active SQL command.
 use_data_pump                          | on                                                          | use datapump to make data transfer more efficient.
 vacuum_cost_delay                      | 0                                                           | Vacuum cost delay in milliseconds.
 vacuum_cost_limit                      | 200                                                         | Vacuum cost amount available before napping.
 vacuum_cost_page_dirty                 | 20                                                          | Vacuum cost for a page dirtied by vacuum.
 vacuum_cost_page_hit                   | 1                                                           | Vacuum cost for a page found in the buffer cache.
 vacuum_cost_page_miss                  | 10                                                          | Vacuum cost for a page not found in the buffer cache.
 vacuum_debug_print                     | off                                                         | vacuum debug print.
 vacuum_defer_cleanup_age               | 0                                                           | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any.
 vacuum_defer_freeze_min_age            | 10000                                                       | Minimum age at which VACUUM should defer to freeze a table row to avoid failure due to too old timestamp.
 vacuum_delta                           | 10000                                                       | The time period before which VACUUM and HOT cleanup could clean the transactions, if any.
 vacuum_freeze_min_age                  | 50000000                                                    | Minimum age at which VACUUM should freeze a table row.
 vacuum_freeze_table_age                | 150000000                                                   | Age at which VACUUM should scan whole table to freeze tuples.
 vacuum_multixact_freeze_min_age        | 5000000                                                     | Minimum age at which VACUUM should freeze a MultiXactId in a table row.
 vacuum_multixact_freeze_table_age      | 150000000                                                   | Multixact age at which VACUUM should scan whole table to freeze tuples.
 wal_block_size                         | 16384                                                       | Shows the block size in the write ahead log.
 wal_buffers                            | 64MB                                                        | Sets the number of disk-page buffers in shared memory for WAL.
 wal_check                              | off                                                         | check consistency of wal.
 wal_compression                        | off                                                         | Compresses full-page writes written in WAL file.
 wal_consistency_checking               |                                                             | Sets the WAL resource managers for which WAL consistency checks are done.
 wal_keep_segments                      | 0                                                           | Sets the number of WAL files held for standby servers.
 wal_level                              | logical                                                     | Set the level of information written to the WAL.
 wal_log_hints                          | off                                                         | Writes full pages to WAL when first modified after a checkpoint, even for a non-critical modifications.
 wal_receiver_status_interval           | 10s                                                         | Sets the maximum interval between WAL receiver status reports to the primary.
 wal_receiver_timeout                   | 1min                                                        | Sets the maximum wait time to receive data from the primary.
 wal_retrieve_retry_interval            | 5s                                                          | Sets the time to wait before retrying to retrieve WAL after a failed attempt.
 wal_segment_size                       | 64MB                                                        | Shows the number of pages per write ahead log segment.
 wal_sender_timeout                     | 1min                                                        | Sets the maximum time to wait for WAL replication.
 wal_stream_type                        | user_stream                                                 | set wal_stream_type type, enum as user_stream/cluster_stream/internal_stream, default is user_stream string
 wal_sync_method                        | fdatasync                                                   | Selects the method used for forcing WAL updates to disk.
 wal_track_entry_number                 | 256ms                                                       | Number of entries to track GTS of each xlog segment.
 wal_writer_delay                       | 500ms                                                       | Time between WAL flushes performed in the WAL writer.
 wal_writer_flush_after                 | 1MB                                                         | Amount of WAL written out by WAL writer that triggers a flush.
 warm_shared_buffer                     | off                                                         | Set connection to warm shared buffer.
 work_mem                               | 4MB                                                         | Sets the maximum memory to be used for query workspaces.
 xc_maintenance_mode                    | off                                                         | Turn on XC maintenance mode.
 xmlbinary                              | base64                                                      | Sets how binary values are to be encoded in XML.
 xmloption                              | content                                                     | Sets whether XML data in implicit parsing and serialization operations is to be considered as documents or content fragments.
 zero_damaged_pages                     | off                                                         | Continues processing past damaged page headers.
(429 rows)
yazun commented 3 years ago

We can confirm unusual high memory usage by TBase in comparison to XL. We see many processes that are doing simple lookups have around 400MB reported in /proc/$pid/smaps.

Could you give us any hints for lowering Tbase memory hunger?

yazun commented 3 years ago

We can see that on XL we rarely go over 100MB per user connection. In TBase, i.e. let's count the private memory of each of the processes(XL shows almost zero such processes on a similar load)

bash -c "ps h -u pgxzDR3 | awk -- '{print \$1}' |  xargs -I {} awk  '/^Private/ {a+=\$2}END{f = a*1024/1024/1024; if(f>100){  cmd = \"ps -up {} |tail -n 1 \" ;  cmd | getline ret;   close(cmd); print \"For pid:\" {} \" process: \"ret \" for MB: \" f  }}'  /proc/{}/smaps "
For pid:12907 process: pgxzDR3  12907  0.8  0.3 13321964 906788 ?     Ss   13:54   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.36(41432) idle in transaction for MB: 438.781
For pid:13885 process: pgxzDR3  13885  1.7  0.3 13330748 916392 ?     Ss   13:55   0:05 postgres: dr3_ops_cs36 surveys 192.168.168.12(44804) idle in transaction for MB: 446.75
For pid:14574 process: pgxzDR3  14574  1.8  0.3 13329740 915164 ?     Ss   13:55   0:05 postgres: dr3_ops_cs36 surveys 192.168.168.108(50812) idle in transaction for MB: 445.758
For pid:15112 process: pgxzDR3  15112  2.1  0.3 13320660 910464 ?     Rs   13:56   0:05 postgres: dr3_ops_cs36 surveys 192.168.168.96(52398) INSERT for MB: 440.379
For pid:16615 process: pgxzDR3  16615  2.1  0.3 13320776 906136 ?     Ss   13:56   0:05 postgres: dr3_ops_cs36 surveys 192.168.168.96(52524) idle in transaction for MB: 437.801
For pid:19356 process: pgxzDR3  19356  2.6  0.3 13320100 905504 ?     Ss   13:58   0:04 postgres: dr3_ops_cs36 surveys 192.168.168.96(52664) idle in transaction for MB: 437.227
For pid:19567 process: pgxzDR3  19567  2.8  0.3 13319520 904604 ?     Ss   13:58   0:04 postgres: dr3_ops_cs36 surveys 192.168.168.108(50816) idle in transaction for MB: 436.578
For pid:19863 process: pgxzDR3  19863  2.5  0.3 13319520 904612 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.108(50818) SELECT for MB: 436.57
For pid:20052 process: pgxzDR3  20052  2.7  0.3 13320096 905528 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.96(52702) SELECT for MB: 437.227
For pid:20054 process: pgxzDR3  20054  2.8  0.3 13319520 904624 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.108(50820) SELECT for MB: 436.57
For pid:20055 process: pgxzDR3  20055  2.9  0.3 13319520 904672 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.108(50822) SELECT for MB: 436.578
For pid:20173 process: pgxzDR3  20173  3.1  0.3 13320032 905476 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.36(41446) idle in transaction for MB: 437.148
For pid:20249 process: pgxzDR3  20249  3.3  0.3 13319520 904620 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.108(50824) SELECT for MB: 436.574
For pid:20277 process: pgxzDR3  20277  3.1  0.3 13319520 904620 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.108(50826) idle in transaction for MB: 436.566
For pid:20344 process: pgxzDR3  20344  2.7  0.3 13323116 908816 ?     Ss   13:58   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.12(44816) SELECT for MB: 439.301
For pid:20391 process: pgxzDR3  20391  3.4  0.3 13320032 905472 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.12(44818) idle in transaction for MB: 437.148
For pid:20408 process: pgxzDR3  20408  3.2  0.3 13322452 908152 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.36(41448) SELECT for MB: 438.688
For pid:20455 process: pgxzDR3  20455  3.2  0.3 13322496 908284 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.12(44820) SELECT for MB: 438.738
For pid:20496 process: pgxzDR3  20496  3.3  0.3 13319520 904620 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.36(41450) SELECT for MB: 436.574
For pid:20641 process: pgxzDR3  20641  2.9  0.3 13322452 908172 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.36(41452) SELECT for MB: 438.672
For pid:20741 process: pgxzDR3  20741  3.0  0.3 13322668 908284 ?     Ss   13:59   0:02 postgres: dr3_ops_cs36 surveys 192.168.168.36(41454) SELECT for MB: 438.887
For pid:20862 process: pgxzDR3  20862  4.2  0.3 13323488 909264 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.12(44822) idle in transaction for MB: 439.727
For pid:20901 process: pgxzDR3  20901  3.6  0.3 13320100 905496 ?     Ss   13:59   0:03 postgres: dr3_ops_cs36 surveys 192.168.168.96(52750) SELECT for MB: 437.227
awk: fatal: cannot open file `/proc/20974/smaps' for reading (No such file or directory)
awk: fatal: cannot open file `/proc/21383/smaps' for reading (No such file or directory)
awk: fatal: cannot open file `/proc/21450/smaps' for reading (No such file or directory)
awk: fatal: cannot open file `/proc/21491/smaps' for reading (No such file or directory)
awk: fatal: cannot open file `/proc/21593/smaps' for reading (No such file or directory)
awk: fatal: cannot open file `/proc/21910/smaps' for reading (No such file or directory)
For pid:21928 process: pgxzDR3  21928  3.7  0.3 13322496 908208 ?     Ss   13:59   0:02 postgres: dr3_ops_cs36 surveys 192.168.168.12(44824) SELECT for MB: 438.742
awk: fatal: cannot open file `/proc/22041/smaps' for reading (No such file or directory)
For pid:22148 process: pgxzDR3  22148  4.3  0.3 13322496 908208 ?     Ss   13:59   0:02 postgres: dr3_ops_cs36 surveys 192.168.168.12(44826) idle in transaction for MB: 438.738
For pid:22150 process: pgxzDR3  22150  4.2  0.3 13322496 908228 ?     Ss   13:59   0:02 postgres: dr3_ops_cs36 surveys 192.168.168.36(41456) SELECT for MB: 438.742
For pid:22554 process: pgxzDR3  22554  3.9  0.3 13319520 904588 ?     Ss   14:00   0:01 postgres: dr3_ops_cs36 surveys 192.168.168.108(50866) idle in transaction for MB: 436.578
For pid:22783 process: pgxzDR3  22783  5.1  0.3 13320488 905856 ?     Ss   14:00   0:01 postgres: dr3_ops_cs36 surveys 192.168.168.96(52796) SELECT for MB: 437.605
For pid:22981 process: pgxzDR3  22981  5.7  0.3 13322496 908212 ?     Ss   14:00   0:01 postgres: dr3_ops_cs36 surveys 192.168.168.12(44836) SELECT for MB: 438.742
For pid:22982 process: pgxzDR3  22982  6.1  0.3 13320488 905884 ?     Ss   14:00   0:01 postgres: dr3_ops_cs36 surveys 192.168.168.96(52798) SELECT for MB: 437.605
awk: fatal: cannot open file `/proc/23184/smaps' for reading (No such file or directory)
For pid:23235 process: pgxzDR3  23235  7.9  0.3 13310912 881792 ?     Ss   14:00   0:01 postgres: dr3_ops_cs36 surveys 192.168.168.36(41466) SELECT for MB: 427.547
For pid:23250 process: pgxzDR3  23250  9.5  0.3 13311876 883192 ?     Ss   14:00   0:01 postgres: dr3_ops_cs36 surveys 192.168.168.96(52842) SELECT for MB: 428.652
awk: fatal: cannot open file `/proc/23393/smaps' for reading (No such file or directory)
For pid:34615 process: pgxzDR3  34615  0.3  1.5 12848240 3981860 ?    S    13:43   0:03 /home/pgxzDR3/XZ_10_STABLE/bin/postgres --coordinator -D /ssd/xzDR3/coord -i for MB: 3538.44
yazun commented 3 years ago

Looks like the OOMs were caused by the kernel memory overcommit that does not happen with using huge pages. Huge pages fix the problem largely, it seems. We need more tests to confirm this though. We also must understand why memory on XZ is so much higher per connection process.

yazun commented 3 years ago

The reality is usually more complicated: it turned out that increased memory usage is related to not using huge pages by Tbase as XL is already using big portion. After finally squeezing max_xonnections etc for test and enabling huge_pages='on' the memory behaviour is much better - it's stable and we do not see any trend. Btw. shared_queues/shared_queue_size have huge impact memory-wise - could you comment what would be your suggested number? - max_connections on coord or max_connections on a datanode?

It seems enabling huge_pages='on' has no effect on the amount of memory usage. Do you have some data of memory usage about the two cases huge_pages='on' and huge_pages='off' but keep the max_connections constant. In theory when enabling huge_pages='on' it may cost even more memory in some cases.

Not really an absolute values, but for 320 connections without huge_pages we oscillate at around 130GB used with stable workload for hours (around 44K huge pages reserved), XL running the same time, with huge_pages=on and without huge_pages after few minutes we are getting OOM on 256GB on one of the machines. But it goes to 230-240GB very fast.

With huge_pages=on we do not see any significant memory increase while adding connections...

q2683252 commented 3 years ago

I noticed that when memory went up, the connections also showed the same trade. So I wonder whether it's caused by the client acquiring more connections? and you can limit max_connections on coordinator or place a load balancer beforce cn.

yazun commented 3 years ago

I think it was not TBase specific - once we started to use hugepages situation stabilized since. So closing. Thanks for help.