citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.4k stars 662 forks source link

Too many links in worker nodes #2560

Closed dasong2410 closed 5 years ago

dasong2410 commented 5 years ago

Hey, we encountered a problem about citus pgsql_job_cache on our production database resently, there were too many files in this directory (exceed 32000 links per inode limit of ext3). Following is the steps to repeat the situation.

env

sql

Create a table clm_biz_exception_log_aimp with 2 columns, insert 100000000 records into it, and then index its 2 columns respectively.

drop table clm_biz_exception_log_aimp;
create table clm_biz_exception_log_aimp
(
    id bigserial,
    exptime TIMESTAMP NOT NULL
);

SELECT create_distributed_table('clm_biz_exception_log_aimp', 'id');

insert into clm_biz_exception_log_aimp(exptime)
select date '1001-09-28' + generate_series from generate_series(1,100000000);

CREATE INDEX clm_biz_exception_log_aimp_id ON clm_biz_exception_log_aimp ( id );
CREATE INDEX clm_biz_exception_log_aimp_exptime ON clm_biz_exception_log_aimp ( exptime );

shell script

Detele 2 records per time, and loop it.

[postgres@sndsdevdb14 ~]$ cat test.sh 
while (true) do
    psql -p 9700 -c "DELETE FROM clm_biz_exception_log_aimp A USING ( SELECT * FROM clm_biz_exception_log_aimp ORDER BY exptime ASC limit 2) B WHERE A.ID=B.ID"
    sleep 1
done

run shell script

Execute this shell script in background.

[postgres@sndsdevdb14 ~]$ nohup sh test.sh &
[1] 9204
nohup: ignoring input and appending output to `nohup.out'
[postgres@sndsdevdb14 ~]$ 

monitor worker's pgsql_job_cache directory

You will see the result number keeps increasing until it reaches 31999.

watch "ls -lh | wc -l"

Those files' name are in format of %u_%u_%u:

[root@sndsdevdb14 ~]# cd /pgsql/citus/worker1/base/pgsql_job_cache/
[root@sndsdevdb14 pgsql_job_cache]# pwd
/pgsql/citus/worker1/base/pgsql_job_cache
[root@sndsdevdb14 pgsql_job_cache]# ll | head 10
total 31040
drwx------ 2 postgres postgres 4096 Dec 19 16:32 10_0_100
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1000
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1001
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1002
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1003
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1004
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1005
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1006
drwx------ 2 postgres postgres 4096 Dec 19 16:51 10_0_1007
[root@sndsdevdb14 pgsql_job_cache]# 

When pgsql_job_cache's links reach 32000, it begins to show errors in nohup.out.

[root@sndsdevdb14 base]# pwd
/pgsql/citus/worker1/base
[root@sndsdevdb14 base]# ll -h
total 1020K
drwx------     2 postgres postgres 4.0K Dec 19 11:37 1
drwx------     2 postgres postgres 4.0K Dec 19 11:37 13450
drwx------     2 postgres postgres  12K Dec 19 15:57 13451
drwx------ 32000 postgres postgres 992K Dec 20 03:32 pgsql_job_cache
drwx------     2 postgres postgres 4.0K Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# 

[postgres@sndsdevdb14 ~]$ tail -n 10 nohup.out 
ERROR:  could not create intermediate results directory "base/pgsql_job_cache/10_0_52329": Too many links
CONTEXT:  while executing command on localhost:9701
ERROR:  could not create intermediate results directory "base/pgsql_job_cache/10_0_52330": Too many links
CONTEXT:  while executing command on localhost:9701
ERROR:  could not create intermediate results directory "base/pgsql_job_cache/10_0_52331": Too many links
CONTEXT:  while executing command on localhost:9701
ERROR:  could not create intermediate results directory "base/pgsql_job_cache/10_0_52332": Too many links
CONTEXT:  while executing command on localhost:9701
ERROR:  could not create intermediate results directory "base/pgsql_job_cache/10_0_52333": Too many links
CONTEXT:  while executing command on localhost:9701
[postgres@sndsdevdb14 ~]$ 

Those %u_%u_%u files will not be deleted until I restart PostgreSQL manaully.

[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 29050 postgres postgres 1015808 Dec 20 09:58 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 28035 postgres postgres 1015808 Dec 20 09:58 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 27556 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 25353 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 25062 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 23344 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 21383 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 19586 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 19229 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 1020
drwx------     2 postgres postgres    4096 Dec 19 11:37 1
drwx------     2 postgres postgres    4096 Dec 19 11:37 13450
drwx------     2 postgres postgres   12288 Dec 20 09:58 13451
drwx------ 15673 postgres postgres 1015808 Dec 20 09:59 pgsql_job_cache
drwx------     2 postgres postgres    4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# ll
total 28
drwx------ 2 postgres postgres  4096 Dec 19 11:37 1
drwx------ 2 postgres postgres  4096 Dec 19 11:37 13450
drwx------ 2 postgres postgres 12288 Dec 20 09:58 13451
drwx------ 2 postgres postgres  4096 Dec 20 10:00 pgsql_job_cache
drwx------ 2 postgres postgres  4096 Dec 19 15:59 pgsql_tmp
[root@sndsdevdb14 base]# 

other test cases

Delete one single record using "where id=xx", the number of those files in directory pgsql_job_cache doesn't increase.

postgres=# select * from clm_biz_exception_log_aimp limit 2;
   id   |       exptime       
--------+---------------------
 224030 | 1615-02-11 00:00:00
 224069 | 1615-03-22 00:00:00
(2 rows)

postgres=# delete from clm_biz_exception_log_aimp where id=224030;
DELETE 1

Detele a bunch of records using "where id in(xx,xx,xx)", the number of those files in directory pgsql_job_cache doesn't increase.

postgres=# select * from clm_biz_exception_log_aimp limit 10;
   id   |       exptime       
--------+---------------------
 224069 | 1615-03-22 00:00:00
 224073 | 1615-03-26 00:00:00
 224083 | 1615-04-05 00:00:00
 224110 | 1615-05-02 00:00:00
 224127 | 1615-05-19 00:00:00
 224146 | 1615-06-07 00:00:00
 224180 | 1615-07-11 00:00:00
 224201 | 1615-08-01 00:00:00
 224226 | 1615-08-26 00:00:00
 224232 | 1615-09-01 00:00:00
(10 rows)

postgres=# delete from clm_biz_exception_log_aimp where id in(224069,224073,224083,224110,224127,224146,224180,224201,224226,224232);
DELETE 10

Detele a bunch of records using "where id in(sub-query)", the number of those files in directory pgsql_job_cache increases.

postgres=# create table t_tmp as select * from clm_biz_exception_log_aimp limit 10;
SELECT 10
postgres=# delete from clm_biz_exception_log_aimp where id in(select id from t_tmp);
DELETE 10
postgres=# \q
[postgres@sndsdevdb14 pgsql]$ psql -p 9700
psql (10.2)
Type "help" for help.

Detele one single record using "where id in(sub-query)", the number of those files in pgsql_job_cache increases.

postgres=# create table t_tmp1 as select * from clm_biz_exception_log_aimp limit 1;
SELECT 1
postgres=# delete from clm_biz_exception_log_aimp where id in(select id from t_tmp);
DELETE 0
postgres=# 

When the table in the sub-query is distributed table, the number of those files in directory pgsql_job_cache doesn't increase.

postgres=# create table t_tmp2 as select * from clm_biz_exception_log_aimp limit 0;
SELECT 0
postgres=# SELECT create_distributed_table('t_tmp2', 'id');
 create_distributed_table 
--------------------------

(1 row)

postgres=# insert into t_tmp2 select * from clm_biz_exception_log_aimp limit 100;
INSERT 0 100
postgres=# delete from clm_biz_exception_log_aimp where id in(select id from t_tmp2);
DELETE 100
postgres=# 

question

I wonder if it's a bug that those files will not be deleted in pgsql_job_cache even after the related process has done all it needs to do and exited? When are those files supposed to be cleaned? Is there any method I can use to delete those files safely except restarting PostgreSQL? Thanks.

marcocitus commented 5 years ago

I think this is is the issue described here: https://github.com/citusdata/citus/issues/2420 , which is fixed as of 8.0.

dasong2410 commented 5 years ago

I think this is is the issue described here: #2420 , which is fixed as of 8.0.

dasong2410 commented 5 years ago

I think this is is the issue described here: #2420 , which is fixed as of 8.0.

thanks, it's the same issue.