Open qwe123520 opened 6 months ago
Hi @qwe123520 ~ Thanks for opening this issue! 🎉
Please make sure you have provided enough information for subsequent discussion.
We will get back to you as soon as possible. ❤️
错误日志如下:
2024-04-26 15:31:50.066 CST [14] [14] LOG: forked new process, pid is 16, true pid is 16
2024-04-26 15:31:50.066 CST [14] [14] LOG: forked new process, pid is 17, true pid is 17
2024-04-26 15:31:50.078 CST [14] [14] LOG: polardb try start vfs process
2024-04-26 15:31:50.078 CST [14] [14] LOG: pfs in localfs mode
2024-04-26 15:31:50.081 CST [14] [14] FATAL: polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable.
2024-04-26 15:31:50.081 CST [14] [14] BACKTRACE:
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(elog_finish+0x1fd) [0x555e31bde55d]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(+0x7db1ae) [0x555e31a4d1ae]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(PostmasterMain+0xf53) [0x555e319dbf63]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(main+0x830) [0x555e316bacf0]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6ace30cd90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f6ace30ce40]
/home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres(_start+0x25) [0x555e316ca6d5]
2024-04-26 15:31:50.202 CST [14] [14] LOG: database system is shut down
@qwe123520 What is your docker startup command?
使用的这个镜像”polardb/polardb_pg_local_instance“,没有配置额外的启动命令。
@qwe123520 跟镜像没有关系,跟从镜像上启动容器的方式有关系。所以我在询问启动容器的命令是什么?用下面的命令启动容器呢?
docker pull polardb/polardb_pg_local_instance
docker run -it --rm polardb/polardb_pg_local_instance psql
docker run -d --name polardb -v /data/polardb/:/var/polardb/ polardb/polardb_pg_local_instance使用的这个命令启动的。
docker run -it --rm polardb/polardb_pg_local_instance psql我只要-v使用本机目录就不行
docker run -d --name polardb -v /data/polardb/:/var/polardb/ polardb/polardb_pg_local_instance使用的这个命令启动的。
本机目录上 /data/polardb/
这个目录存在且非空吗?
是的,它存在并且非空
是的,它存在并且非空
需要用一个存在且空白的目录来启动容器,这样容器启动脚本发现目录为空就会在这个目录中 initdb 创建数据目录;如果启动脚本发现目录不为空,就会按启动脚本中指定好的数据目录拉起数据库,如果目录中已有内容是一些别的文件就有问题。
这个目录是之前启动的时候创建出来的,然后修改了postgres.conf然后就起不来了
@mrdrivingduck 快来回答问题啦
就是修改里面postgres.conf之后才会出现这样的问题 就不知道和shared_datadir 有啥关系 快出来解决问题啦~~~~~~~~~~~~~~~~
快快快
还有就是恢复之前的conf内容 都不行 就改不得
@qwe123520 @SamirWell
/data/polardb/
下应该会有 primary_dir/
之类的几个目录。可以看下每个目录中的 current_logfiles
找到错误日志名称,看看最后的错误日志内容是什么2024-05-22 17:56:18.943 CST [20] [20] LOG: vfs_unlink file-dio:///var/polardb/shared_datadir/polar_flog/flashback_log.history.tmp 2024-05-22 17:56:18.944 CST [20] [20] LOG: vfs_rename from file-dio:///var/polardb/shared_datadir/polar_flog/flashback_log.history.tmp to file-dio:///var/polardb/shared_datadir/polar_flog/flashback_log.history 2024-05-22 17:56:18.944 CST [20] [20] LOG: The flashback log will switch from 0/877E0 to 0/10000000 2024-05-22 17:56:18.944 CST [20] [20] LOG: The flashback log shared buffer is ready now, the current point(position) is 0/10000000(0/FF3FFF0), previous point(position) is 0/0(0/0), initalized upto point is 0/10000000 2024-05-22 17:56:18.945 CST [20] [20] LOG: enable persisted slot, read slot from polarstore. 2024-05-22 17:56:18.945 CST [20] [20] LOG: vfs open dir pg_replslot, num open dir 1 2024-05-22 17:56:18.945 CST [20] [20] LOG: vfs open dir file-dio:///var/polardb/shared_datadir/pg_replslot, num open dir 1 2024-05-22 17:56:18.945 CST [20] [20] LOG: vfs_unlink file-dio:///var/polardb/shared_datadir/pg_replslot/replica1/state.tmp 2024-05-22 17:56:18.946 CST [20] [20] LOG: restore slot replica1 with version 10002, replay_lsn is 0/1BA24B8, restart_lsn is 0/1752788 2024-05-22 17:56:18.946 CST [20] [20] LOG: vfs_unlink file-dio:///var/polardb/shared_datadir/pg_replslot/replica2/state.tmp 2024-05-22 17:56:18.946 CST [20] [20] LOG: restore slot replica2 with version 10002, replay_lsn is 0/1BA24B8, restart_lsn is 0/1752788 2024-05-22 17:56:18.946 CST [20] [20] LOG: vfs open dir pg_replslot, num open dir 1 2024-05-22 17:56:18.946 CST [20] [20] LOG: vfs open dir file-dio:///var/polardb/shared_datadir/pg_twophase, num open dir 1 2024-05-22 17:56:18.946 CST [20] [20] LOG: database system was not properly shut down; automatic recovery in progress 2024-05-22 17:56:18.946 CST [20] [20] LOG: state is 4 2024-05-22 17:56:18.965 CST [19] [19] LOG: polar_flog_index log index is insert from 28 2024-05-22 17:56:19.023 CST [19] [19] WARNING: The flashback log record at 0/895F0 will be ignore. and switch to 0/10000028 2024-05-22 17:56:19.023 CST [19] [19] LOG: Recover the flashback logindex to 0/10000000 2024-05-22 17:56:19.362 CST [21] [21] PANIC: polardb shared storage is unavailable. 2024-05-22 17:56:19.362 CST [21] [21] BACKTRACE: postgres(5432): polar worker process (+0x3fdc5e) [0x560ccc2d4c5e] /home/postgres/tmp_basedir_polardb_pg_1100_bld/lib/polar_worker.so(polar_worker_handler_main+0xd6) [0x7fdf24745ff6] postgres(5432): polar worker process (StartBackgroundWorker+0x2d7) [0x560ccc629517] postgres(5432): polar worker process (+0x76441c) [0x560ccc63b41c] postgres(5432): polar worker process (+0x765dbe) [0x560ccc63cdbe] postgres(5432): polar worker process (PostmasterMain+0xd4c) [0x560ccc640d5c] postgres(5432): polar worker process (main+0x830) [0x560ccc31fcf0] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fdf231fed90] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fdf231fee40] postgres(5432): polar worker process (_start+0x25) [0x560ccc32f6d5]
修改了所有目录内的conf里面 max_connections = 2000
其实我刚才的做法是 先 docker 初始化了数据库 没有启动 修改了所有里面的配置最大连接数为2000
然后启动docker 是ok的,
我再次重启一下容器 就不行了额, 应该是另有原因,看着像是重新挂载方面的问题
inline int
polar_mount(void)
{
int ret = 0;
if (polar_vfs[polar_vfs_switch].vfs_mount)
ret = polar_vfs[polar_vfs_switch].vfs_mount();
if (polar_enable_io_fencing && ret == 0)
{
/* POLAR: FATAL when shared storage is unavailable, or force to write RWID. */
if (polar_shared_storage_is_available())
{
polar_hold_shared_storage(false);
POLAR_IO_FENCING_SET_STATE(polar_io_fencing_get_instance(), POLAR_IO_FENCING_WAIT);
}
else
elog(FATAL, "polardb shared storage %s is unavailable.", polar_datadir);
}
return ret;
}
inline int
polar_remount(void)
{
int ret = 0;
if (polar_vfs[polar_vfs_switch].vfs_remount)
ret = polar_vfs[polar_vfs_switch].vfs_remount();
if (polar_enable_io_fencing && ret == 0)
{
/* POLAR: FATAL when shared storage is unavailable, or force to write RWID. */
if (polar_shared_storage_is_available())
{
polar_hold_shared_storage(true);
POLAR_IO_FENCING_SET_STATE(polar_io_fencing_get_instance(), POLAR_IO_FENCING_WAIT);
}
else
elog(FATAL, "polardb shared storage %s is unavailable.", polar_datadir);
}
return ret;
}
@mrdrivingduck 要不你测试下场景
我测试了如下场景,没有发现问题:
$ mkdir polardb_pg
$ docker run -it --rm \
--env POLARDB_PORT=5432 \
--env POLARDB_USER=u1 \
--env POLARDB_PASSWORD=your_password \
-v ./polardb_pg:/var/polardb \
polardb/polardb_pg_local_instance \
echo 'done'
## edit max_connections in three postgresql.conf files
$ docker run -d \
-p 54320-54322:5432-5434 \
-v ./polardb_pg:/var/polardb \
polardb/polardb_pg_local_instance
36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61
$ docker exec -it 36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61 bash
$ ps -ef
$ exit
$ docker stop 36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61
36c196cd8cb3e7b3dfcd2b9268409377462ee42caf95289080ce20f17ab45f61
$ docker run -d \
-p 54320-54322:5432-5434 \
-v ./polardb_pg:/var/polardb \
polardb/polardb_pg_local_instance
cdbffcd6b3e6e2f55ac98ee61bfd48ac185db624f5142f3dfc7a0f920ac7a154
$ docker exec -it cdbffcd6b3e6e2f55ac98ee61bfd48ac185db624f5142f3dfc7a0f920ac7a154 bash
$ ps -ef
可能是我在k3s上面部署的原因吗?
可能是我在k3s上面部署的原因吗?
需要看下在容器内能否正确访问 /var/polardb/shared_datadir
,以及里面的文件是否符合预期。另外确保 volume 没有被多个容器挂载。
可能是我在k3s上面部署的原因吗?
需要看下在容器内能否正确访问
/var/polardb/shared_datadir
,以及里面的文件是否符合预期。另外确保 volume 没有被多个容器挂载。
如果是k3s或者k8s这种滚动升级,存在同时挂载的时间窗, 就会挂掉是不~
刚才又重新测试下这种 延迟重启的场景 还是挂的 o(╥﹏╥)o
可能是我在k3s上面部署的原因吗?
需要看下在容器内能否正确访问
/var/polardb/shared_datadir
,以及里面的文件是否符合预期。另外确保 volume 没有被多个容器挂载。如果是k3s或者k8s这种滚动升级,存在同时挂载的时间窗, 就会挂掉是不~
polardb_pg_local_instance 这个镜像是一个在单机运行共享存储集群的 demo,里面有个简单的 entrypoint 脚本来做管理,目的是方便快速拉起并体验。如果有外部的集群管理和存储管理,那么会和这里面运行的 entrypoint 脚本冲突。建议直接使用纯二进制镜像 polardb/polardb_pg_binary 来适配集群管理工具,这里面是没有管理脚本的。
最后测试重启前执行
rm -f $shared_datadir/DEATH
就好了,这样就适合在k8s/k3s上单节点部署使用了吧
最后测试重启前执行
rm -f $shared_datadir/DEATH
就好了,这样就适合在k8s/k3s上单节点部署使用了吧
产生这个文件说明至少有两个数据库实例在同一份数据目录上启动了。这样是有问题的。
Describe the problem
docker单节点启动polardb-pg修改配置文件起不来报错polardb shared storage file-dio:///var/polardb/shared_datadir is unavailable 配置文件如下:
postgresql.txt
...