dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
282 stars 135 forks source link

seeing issues w/ failing `f_create_inode` percolating to client (4.2.32) #4875

Open DmitryLitvintsev opened 5 years ago

DmitryLitvintsev commented 5 years ago

seeing these:

[root@stkendca1801 dcache-log]# grep 'GFTP-stkendca1801-AAWJaR3YN9A' gridftp-stkendca1801Domain.access
level=INFO ts=2019-05-21T12:24:08.836-0500 event=org.dcache.ftp.hello session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain socket.remote=131.225.167.72:34142 socket.local=131.225.69.80:2811 reply="220 GSI FTP door ready"
level=INFO ts=2019-05-21T12:24:08.837-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="AUTH GSSAPI" reply="334 ADAT must follow"
level=INFO ts=2019-05-21T12:24:08.859-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2019-05-21T12:24:08.895-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2019-05-21T12:24:08.900-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2019-05-21T12:24:08.908-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ADAT ..." reply="235 OK"
level=INFO ts=2019-05-21T12:24:08.928-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{USER :globus-mapping:}" user.dn="/DC=org/DC=cilogon/C=US/O=Fermi National Accelerator Laboratory/OU=People/CN=Nandita Raha/CN=UID:nraha" user.mapped=51158:9950,9957,9767 reply="ENC{200 User :globus-mapping: logged in}"
level=INFO ts=2019-05-21T12:24:08.928-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="PASS ..." reply="ENC{200 PASS command successful}"
level=INFO ts=2019-05-21T12:24:08.929-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{SITE HELP}" reply="ENC{214-The following site-specific commands are supported:\r\nSITE <SP> HELP - Information about SITE commands\r\nSITE <SP> BUFSIZE <SP> <size> - Set network buffer to <size>\r\nSITE <SP> CHKSUM <SP> <value> - Fail upload if ADLER32 checksum isn't <value>\r\nSITE <SP> CHGRP <SP> <group> <SP> <path> - Change group-owner of <path> to group <group>\r\nSITE <SP> CHMOD <SP> <perm> <SP> <path> - Change permission of <path> to octal value <perm>\r\nSITE <SP> CLIENTINFO <SP> <id> - Provide server with information about the client\r\nSITE <SP> SYMLINKFROM <SP> <path> - Register symlink location; SYMLINKTO must follow\r\nSITE <SP> SYMLINKTO <SP> <path> - Create symlink to <path>; SYMLINKFROM must be earlier command\r\nSITE <SP> TASKID <SP> <id> - Provide server with an identifier\r\nSITE <SP> USAGE <SP> [TOKEN <SP> <token> <SP> ] <path>\r\n214 SITE <SP> WHOAMI - Provides the username or uid of the user}"
level=INFO ts=2019-05-21T12:24:08.929-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{FEAT}" reply="ENC{211-OK\r\n EOF\r\n PARALLEL\r\n MODE-E-PERF\r\n SIZE\r\n SBUF\r\n ERET\r\n ESTO\r\n GETPUT\r\n MDTM\r\n CKSUM ADLER32,MD5,MD4\r\n MODEX\r\n TVFS\r\n MFMT\r\n MFCT\r\n MFF Modify;Create;UNIX.mode;\r\n PASV AllowDelayed\r\n MLSC\r\n MLST Size*;Modify*;Create;Type*;Unique*;Perm*;UNIX.owner*;UNIX.group*;UNIX.uid*;UNIX.gid*;UNIX.mode*;UNIX.ctime;UNIX.atime;\r\n211 End}"
level=INFO ts=2019-05-21T12:24:08.930-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{SITE CLIENTINFO scheme=gsiftp;appname=\"globus-url-copy\";appver=\"10.4 (gcc64, 1550490409-0) [Grid Community Toolkit 6.2]\";}" reply="ENC{250 OK}"
level=INFO ts=2019-05-21T12:24:08.930-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{TYPE I}" reply="ENC{200 Type set to I}"
level=INFO ts=2019-05-21T12:24:08.931-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{ALLO 36804}" reply="ENC{200 ALLO command successful}"
level=INFO ts=2019-05-21T12:25:47.199-0500 event=org.dcache.ftp.response session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain command="ENC{PUT path=/pnfs/fnal.gov/usr/GM2/scratch/users/nraha/dstsEndgame/daq/2019-05-21-05-21-35/logs/gm2rootTrees_ana_19674890_17416.00478.log;pasv;}" reply="ENC{451 Operation failed: BackEndErrorHimeraFsException (PreparedStatementCallback; SQL [SELECT f_create_inode(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; An I/O error occurred while sending to the backend.; nested exception is org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend.)}"
level=INFO ts=2019-05-21T12:25:47.209-0500 event=org.dcache.ftp.disconnect host.remote=131.225.167.72:34142 session=door:GFTP-stkendca1801-AAWJaR3YN9A@gridftp-stkendca1801Domain

Our postgresql LOG is filled with these :

< 2019-05-21 00:00:00.284 CDT 127.0.0.1(53332) 11779 > ERROR:  duplicate key value violates unique constraint "t_dirs_pkey"
< 2019-05-21 00:00:00.284 CDT 127.0.0.1(53332) 11779 > DETAIL:  Key (iparent, iname)=(1986348275, prod_v08_00_00_01a) already exists.
< 2019-05-21 00:00:00.284 CDT 127.0.0.1(53332) 11779 > CONTEXT:  SQL statement "INSERT INTO t_dirs VALUES (parent, newid, name)"
        PL/pgSQL function f_create_inode(bigint,character varying,character varying,integer,integer,integer,integer,integer,bigint,integer,timestamp without time zone) line 6 at SQL statement
< 2019-05-21 00:00:00.284 CDT 127.0.0.1(53332) 11779 > STATEMENT:  SELECT f_create_inode($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11)

with huge frequency. SAD

paulmillar commented 5 years ago

Are you sure these two are directly related? The first error talks about An I/O error occurred while sending to the backend (suggesting PnfsManager couldn't connect to the database), while the second talks about an update violates unique constraint (suggesting an attempt to create the same file within the database).