ApsaraDB / PolarDB-for-PostgreSQL

A cloud-native database based on PostgreSQL developed by Alibaba Cloud.
https://apsaradb.github.io/PolarDB-for-PostgreSQL/zh/
Apache License 2.0
2.83k stars 450 forks source link

[Question]java连接polardb,执行sql频繁报Cause: com.aliyun.polardb.util.PSQLException: ERROR: out of memory #508

Open sise1369 opened 3 months ago

sise1369 commented 3 months ago

Describe the problem com.aliyun.polardb.util.PSQLException: ERROR: out of memory 详细:Failed on request of size 16 in memory context "CachedPlanQuery". at com.aliyun.polardb.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2763) at com.aliyun.polardb.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2372) at com.aliyun.polardb.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:350) at com.aliyun.polardb.jdbc.PgStatement.executeInternal(PgStatement.java:488) at com.aliyun.polardb.jdbc.PgStatement.execute(PgStatement.java:405) at com.aliyun.polardb.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:245) at com.aliyun.polardb.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:234) ...

服务器是96核,128GB内存 调整以下参数,仍然还是会报内存溢出。特来咨询下,还有其他解决办法吗 shared_buffers=12GB work_mem = 24MB
maintenance_work_mem = 128MB autovacuum_work_mem = 128MB

polardb-bot[bot] commented 3 months ago

Hi @sise1369 ~ Thanks for opening this issue! 🎉

Please make sure you have provided enough information for subsequent discussion.

We will get back to you as soon as possible. ❤️

mrdrivingduck commented 3 months ago

@sise1369 Could you monitor the memory usage of your machine at the moment when there is an OOM happening?

Mr-TTWang commented 2 months ago

image 遇到这样的问题

Mr-TTWang commented 2 months ago

@mrdrivingduck see pic above, please help

liuchengshan-lcs commented 2 months ago

方便介绍一下你的操作流程吗?我们看下能否进行分析/复现。目前日志信息显示sharedserver在进行shmem_alloc的时候遇到oom,进而出发回退机制,算符合预期。引起oom的原因目前看是CachedPlanQuery,但是具体原因可能还需要您提供一些核心操作。

liuchengshan-lcs commented 2 months ago

方便介绍一下你的操作流程吗?我们看下能否进行分析/复现。目前日志信息显示sharedserver在进行shmem_alloc的时候遇到oom,进而出发回退机制,算符合预期。引起oom的原因目前看是CachedPlanQuery,但是具体原因可能还需要您提供一些核心操作。

您可以在相同操作下把shared server功能屏蔽一下,可以判断是否shared server带来的问题,目前看起来应该和这个关系不大

Mr-TTWang commented 2 months ago

方便介绍一下你的操作流程吗?我们看下能否进行分析/复现。目前日志信息显示sharedserver在进行shmem_alloc的时候遇到oom,进而出发回退机制,算符合预期。引起oom的原因目前看是CachedPlanQuery,但是具体原因可能还需要您提供一些核心操作。

您可以在相同操作下把shared server功能屏蔽一下,可以判断是否shared server带来的问题,目前看起来应该和这个关系不大

我是使用的 binary那个镜像 然后只启动另一个primary, 然后启动脚本是

#!/bin/bash

polar_data_dir=${POLARDB_DATA_DIR}
sudo mkdir -p ${polar_data_dir}
sudo chmod a+wr ${polar_data_dir}
sudo chown -R postgres:postgres ${polar_data_dir}

primary_datadir="$polar_data_dir/primary_datadir"
shared_datadir="$polar_data_dir/shared_datadir"

polardb_init() {
    primary_port=${POLARDB_PORT:-5432}

    # primary private dir and shared dir
    rm -rf ${primary_datadir}
    rm -rf ${shared_datadir}
    mkdir -p ${primary_datadir}
    mkdir -p ${shared_datadir}

    # initdb
    initdb -k -U postgres -D ${primary_datadir}

    # default GUCs
    echo "polar_enable_shared_storage_mode = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_hostid = 1" >> ${primary_datadir}/postgresql.conf
    echo "max_connections = 2000" >> ${primary_datadir}/postgresql.conf
    echo "polar_wal_pipeline_enable = true" >> ${primary_datadir}/postgresql.conf
    echo "polar_create_table_with_full_replica_identity = off" >> ${primary_datadir}/postgresql.conf
    echo "logging_collector = on" >> ${primary_datadir}/postgresql.conf
    echo "log_directory = 'pg_log'" >> ${primary_datadir}/postgresql.conf

    echo "shared_buffers = '2GB'" >> ${primary_datadir}/postgresql.conf
    echo "synchronous_commit = on" >> ${primary_datadir}/postgresql.conf
    echo "full_page_writes = off" >> ${primary_datadir}/postgresql.conf
    echo "autovacuum_naptime = 10min" >> ${primary_datadir}/postgresql.conf
    echo "max_worker_processes = 32" >> ${primary_datadir}/postgresql.conf
    echo "polar_use_statistical_relpages = off" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_persisted_buffer_pool = off" >> ${primary_datadir}/postgresql.conf
    echo "polar_nblocks_cache_mode = 'all'" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_replica_use_smgr_cache = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_standby_use_smgr_cache = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_flashback_log = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_fast_recovery_area = on" >> ${primary_datadir}/postgresql.conf

    # storage-related GUCs
    disk_name=`echo ${shared_datadir} | cut -d '/' -f2`
    echo "polar_vfs.localfs_mode = true" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_localfs_test_mode = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_shared_storage_mode = on" >> ${primary_datadir}/postgresql.conf
    echo "listen_addresses = '*'" >> ${primary_datadir}/postgresql.conf
    echo "polar_disk_name = '$disk_name'" >> ${primary_datadir}/postgresql.conf
    echo "polar_datadir = 'file-dio://$shared_datadir'" >> ${primary_datadir}/postgresql.conf

    # preload extensions
    echo "shared_preload_libraries = 'polar_px,polar_vfs,polar_worker,pg_stat_statements,auth_delay,auto_explain,polar_monitor_preload,polar_stat_sql,timescaledb'" >> ${primary_datadir}/postgresql.conf

    # shared dir initialization
    polar-initdb.sh ${primary_datadir}/ ${shared_datadir}/ localfs

    # allow external connections
    echo "host all all 0.0.0.0/0 md5" >> ${primary_datadir}/pg_hba.conf

    echo "port = $primary_port" >> ${primary_datadir}/postgresql.conf
    echo "polar_hostid = 100" >> ${primary_datadir}/postgresql.conf
    echo "full_page_writes = off" >> ${primary_datadir}/postgresql.conf

    # PX related GUCs
    echo "polar_enable_px=0" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_check_workers=0" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_replay_wait=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_dop_per_node=3" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_max_workers_number=0" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_cte_shared_scan=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_partition=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_left_index_nestloop_join=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_wait_lock_timeout=1800000" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_partitionwise_join=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_optimizer_multilevel_partitioning=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_max_slices=1000000" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_adps=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_adps_explain_analyze=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_trace_heap_scan_flow=1" >> ${primary_datadir}/postgresql.conf
    echo "polar_px_enable_spi_read_all_namespaces=1" >> ${primary_datadir}/postgresql.conf

    # Shared server GUCs
    echo "polar_enable_shared_server = on" >> ${primary_datadir}/postgresql.conf
    echo "polar_enable_shm_aset = on" >> ${primary_datadir}/postgresql.conf

    # start up primary node
    pg_ctl -D ${primary_datadir} start

    # create default user with password, if specified
    if [[ ${POLARDB_USER} == "postgres" ]];
    then
        if [[ -n ${POLARDB_PASSWORD} ]];
        then
            psql -h 127.0.0.1 -p $primary_port -d postgres -c "ALTER ROLE ${POLARDB_USER} PASSWORD '${POLARDB_PASSWORD}'"
        fi
    elif [[ -n ${POLARDB_USER} ]];
    then
        if [[ -n ${POLARDB_PASSWORD} ]];
        then
            psql -h 127.0.0.1 -p $primary_port -d postgres -c "CREATE ROLE ${POLARDB_USER} PASSWORD '${POLARDB_PASSWORD}' SUPERUSER LOGIN"
        fi
    fi
}

# If the data volume is empty, we will try to initdb here.
if [ -z "$(ls -A $polar_data_dir)" ];
then
    polardb_init
else
    rm -f $shared_datadir/DEATH # solve the restarting error
    echo "waiting starting"
    sleep 60s
    pg_ctl -D ${primary_datadir} start
fi

tail -f /dev/null

# Stop PolarDB-PG.
if [ ! $(pg_ctl -D ${primary_datadir} status | grep -q "server is running") ];
then
    pg_ctl -D ${primary_datadir} stop
fi

C++ 服务连接倒是正常,但是java使用postgres的jdbc连接的时候 会有那种问题,只是猜测,因为我们登录界面调用java接口无法查询到数据库信息,无法登录。 下午调整了

shared_buffers; 4GB work_mem; 256MB maintenance_work_mem; 1GB 还是存在问题,然后 昨晚我把 #polar_enable_shm_aset = on 注释掉了 没再报错。

不知道使用单实例部署,shared相关需要怎么处理和配置,或者还是有啥别的原因..

liuchengshan-lcs commented 2 months ago

polar_enable_shm_aset参数表示是否开启全局内存。根据您的描述来看,可能是jdbc的客户端链接在某些条件下和内核不兼容?会触发全局内存的OOM。您那边临时先关闭全局内存的参数进行下测试和使用?我们也尝试一下。

Mr-TTWang commented 2 months ago

polar_enable_shm_aset参数表示是否开启全局内存。根据您的描述来看,可能是jdbc的客户端链接在某些条件下和内核不兼容?会触发全局内存的OOM。您那边临时先关闭全局内存的参数进行下测试和使用?我们也尝试一下。

好的,等你们消息