dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
282 stars 135 forks source link

5.2 : chimera exceptions during routine testing #5230

Open DmitryLitvintsev opened 4 years ago

DmitryLitvintsev commented 4 years ago

See this while running tests that involve writing, reading and them removing files from the same directory. Test runs 5 parallel threads:

Remote file does not exist, write test failed
2019-12-11 17:08:16 : FAILURE for Xrootd-dmsdcadev01@xrootd-dmsdcadev01Domain 
 ^M[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Server responded with an error: [3012] Failed to open file (PreparedStatementCallback; SQL [SELECT f_create_inode(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; ERROR: deadlock detected;  Detail: Process 1098 waits for ShareLock on transaction 554141610; blocked by process 1097.;Process 1097 waits for ExclusiveLock on tuple (904647,10) of relation 16400 of database 16387; blocked by process 1098.;  Hint: See server log for query details.;  Where: SQL statement "UPDATE t_inodes SET inlink=inlink+1,imtime=now,ictime=now,igeneration=igeneration+1 WHERE inumber = parent";PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp without time zone) line 7 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected;  Detail: Process 1098 waits for ShareLock on transaction 554141610; blocked by process 1097.;Process 1097 waits for ExclusiveLock on tuple (904647,10) of relation 16400 of database 16387; blocked by process 1098.;  Hint: See server log for query details.;  Where: SQL statement "UPDATE t_inodes SET inlink=inlink+1,imtime=now,ictime=now,igeneration=igeneration+1 WHERE inumber = parent";PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp without time zone) line 7 at SQL statement [10011])

NB: It appears to me the issue is with link count trigger. When we upgraded from 2.16 to 4.2 on different instance at Fermilab ("public dCache") we had problems with this trigger as well and we removed this trigger (and corresponding patch that introduced the code)

Database log shows deadlock errors:

< 2019-12-11 17:08:15.062 CST 127.0.0.1(51302) 45001 > ERROR:  deadlock detected
< 2019-12-11 17:08:15.062 CST 127.0.0.1(51302) 45001 > DETAIL:  Process 45001 waits for ExclusiveLock on tuple (904647,8) of relation 16400 of database 16387; blocked by process 2053.
        Process 2053 waits for ShareLock on transaction 554141608; blocked by process 45001.
        Process 45001: SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
        Process 2053: SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
< 2019-12-11 17:08:15.062 CST 127.0.0.1(51302) 45001 > HINT:  See server log for query details.
< 2019-12-11 17:08:15.062 CST 127.0.0.1(51302) 45001 > CONTEXT:  SQL statement "UPDATE t_inodes SET inlink=inlink+1,imtime=now,ictime=now,igeneration=igeneration+1 WHERE inumber = parent"
        PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp without time zone) line 7 at SQL statement
< 2019-12-11 17:08:15.062 CST 127.0.0.1(51302) 45001 > STATEMENT:  SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
< 2019-12-11 17:08:15.063 CST 127.0.0.1(51366) 2053 > LOG:  duration: 1001.975 ms  execute S_18: SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
< 2019-12-11 17:08:15.063 CST 127.0.0.1(51366) 2053 > DETAIL:  parameters: $1 = '34919202', $2 = 'pd_gsiftp_ff8e12b5-d068-4c1f-bec2-353981e55193', $3 = '00002188D49A5E074E4597BBFBD4F2B4C135', $4 = '32768', $5 = '420', $6 = '1', $7 = '9811', $8 = '5063', $9 = '0', $10 = '2', $11 = '2019-12-11 17:08:14.061'
< 2019-12-11 17:08:16.066 CST 127.0.0.1(51354) 1098 > ERROR:  deadlock detected
< 2019-12-11 17:08:16.066 CST 127.0.0.1(51354) 1098 > DETAIL:  Process 1098 waits for ShareLock on transaction 554141610; blocked by process 1097.
        Process 1097 waits for ExclusiveLock on tuple (904647,10) of relation 16400 of database 16387; blocked by process 1098.
        Process 1098: SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
        Process 1097: SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
< 2019-12-11 17:08:16.066 CST 127.0.0.1(51354) 1098 > HINT:  See server log for query details.
< 2019-12-11 17:08:16.066 CST 127.0.0.1(51354) 1098 > CONTEXT:  SQL statement "UPDATE t_inodes SET inlink=inlink+1,imtime=now,ictime=now,igeneration=igeneration+1 WHERE inumber = parent"
        PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp without time zone) line 7 at SQL statement
< 2019-12-11 17:08:16.066 CST 127.0.0.1(51354) 1098 > STATEMENT:  SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)
< 2019-12-11 17:08:16.067 CST 127.0.0.1(51352) 1097 > LOG:  duration: 1784.233 ms  execute S_19: SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)

This is postgresql 9.2.24.

Yes, this is old postgresql. But existing dCache (2.13) CMS T1 runs on that version w/o a hitch.

kofemann commented 4 years ago

Is it still problem with 5.2.7?

https://github.com/dcache/dcache/commit/0d8144d766c98f5ef28f48224408a49ae10abc3f

DmitryLitvintsev commented 4 years ago

this is 5.2.11

BTW, the test does not involve mkdir. As I said it is:

write file -> read file -> remove file

running in 5 parallel threads in the loop. Destination directory is always the same. There is no any other load on the machine.

File is created always with unique name.

In Oct 2018 when we upgraded public dCache to 4.2 I relayed to you that this trigger is problematic (we ran into deadlocks as well). I made custom 4.2 release where I reverted the patch that introduced this trigger (and accompanying code). I would rather you do the same on master branch (and the rest of stable branches). I thought I would avoid maintaining a private dCache branch for CMS, looks like not.

kofemann commented 4 years ago

The custom code you have it related to tags nlink count, which is triggered on directory creation/removal. The issue you have is hitting on update t_inodes during concurrent create/remove.

kofemann commented 4 years ago

Do you get it reproduced with https://github.com/kofemann/fio-workload/blob/master/concurrent-writes.fio ? (you will need at least centos7)

DmitryLitvintsev commented 4 years ago

I have run fio test - no deadlock seen.

DmitryLitvintsev commented 4 years ago

But let me run it in the loop. I will keep you posted.

Interestingly all errors were triggered by other protocols :

this is distribution of failures:

     49 GFTP-dmsdcadev06@gridftp-dmsdcadev06Domain
     46 GFTP-dmsdcadev05@gridftp-dmsdcadev05Domain
     39 Xrootd-dmsdcadev01@xrootd-dmsdcadev01Domain
     38 WebDAV-dmsdcadev01@webdavDomain
      1 SRM-dmsdcadev01@srm-dmsdcadev01Domain

this is distribution of success:

   5793 NFS-dmsdcadev03@nfsDomain
   5793 NFS-dmsdcadev01@nfsDomain.v3
   5792 SRM-dmsdcadev01@srm-dmsdcadev01Domain
   5755 WebDAV-dmsdcadev01@webdavDomain
   5754 Xrootd-dmsdcadev01@xrootd-dmsdcadev01Domain
   5747 GFTP-dmsdcadev05@gridftp-dmsdcadev05Domain
   5744 GFTP-dmsdcadev06@gridftp-dmsdcadev06Domain

NFS-dmsdcadev03@nfsDomain and NFS-dmsdcadev01@nfsDomain.v3 just means there are two write -> read -> remove tests done over mounted NFS v4.1 for each test iteration. SRM means test w/ srmcp. GFTP test w/ globus-url-copy.

kofemann commented 4 years ago

Cool! I can reproduce it with unit test and couple of sleeps.

DmitryLitvintsev commented 4 years ago

Unit tests are good for something!

kofemann commented 4 years ago

Can you run tests against branch: https://github.com/kofemann/dcache/commit/ceee9ae92dca69f2b954edebc440aa4098b7e874

DmitryLitvintsev commented 4 years ago

Still getting the same error:

2019-12-16 18:23:42 : FAILURE for GFTP-dmsdcadev05@gridftp-dmsdcadev05Domain 

error: globus_ftp_client: the server responded with an error
451 Operation failed: PreparedStatementCallback; SQL [SELECT f_create_inode(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; ERROR: deadlock detected;  Detail: Process 28556 waits for ShareLock on transaction 568891674; blocked by process 26872.;Process 26872 waits for ExclusiveLock on tuple (904686,9) of relation 16400 of database 16387; blocked by process 28556.;  Hint: See server log for query details.;  Where: SQL statement "UPDATE t_inodes SET inlink=inlink+1,imtime=now,ictime=now,igeneration=igeneration+1 WHERE inumber = parent";PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp with time zone) line 7 at SQL statement; nested exception is org.postgresql.util.PSQLException: ERROR: deadlock detected;  Detail: Process 28556 waits for ShareLock on transaction 568891674; blocked by process 26872.;Process 26872 waits for ExclusiveLock on tuple (904686,9) of relation 16400 of database 16387; blocked by process 28556.;  Hint: See server log for query details.;  Where: SQL statement "UPDATE t_inodes SET inlink=inlink+1,imtime=now,ictime=now,igeneration=igeneration+1 WHERE inumber = parent";PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp with time zone) line 7 at SQL statement
[root@dmsdcadev03 ~]# dcache status
DOMAIN          STATUS                   PID   USER LOG                                 
namespaceDomain running (for 18 minutes) 26657 root /var/log/dcache/namespaceDomain.log 
cleanerDomain   running (for 18 minutes) 26721 root /var/log/dcache/cleanerDomain.log   
nfsDomain       running (for 18 minutes) 26777 root /var/log/dcache/nfsDomain.log       
[root@dmsdcadev03 ~]# dcache version
5.2.12-SNAPSHOT
DmitryLitvintsev commented 4 years ago

My next step is to upgrade postgersql. Just in case. As the matter of fact I have seen similar issues long time ago in 9.3.5

https://postgrespro.com/list/thread-id/1548239

DmitryLitvintsev commented 4 years ago

I am happy to report that after I upgraded to postgresql 11.6 I am not seeing the issue :

    599 GFTP-dmsdcadev06@gridftp-dmsdcadev06Domain
    598 Xrootd-dmsdcadev01@xrootd-dmsdcadev01Domain
    598 WebDAV-dmsdcadev01@webdavDomain
    598 SRM-dmsdcadev01@srm-dmsdcadev01Domain
    598 NFS-dmsdcadev03@nfsDomain
    598 NFS-dmsdcadev01@nfsDomain.v3
    598 GFTP-dmsdcadev05@gridftp-dmsdcadev05Domain

(only successes so far).

I am running the branch 5.2 that Tigran provided with the fix he made. So. as postgresql 9.2 is obsolete this would not be a issue anymore.

I will keep it runnig overnight.

paulmillar commented 4 years ago

That makes me nervous: if a version upgrade made the problem disappear, another version upgrade could make it reappear.