Open DmitryLitvintsev opened 3 years ago
pipeline.addBefore(self, "frameDecoder", new LineBasedFrameDecoder(KiB.toBytes(64)));
in the NettyLineBasedDoor. Where does this max size come from? The question of course is why are we getting the oversized frame. That is, what would not be observing the 64K limit?
access file for this interaction:
level=INFO ts=2021-11-01T14:37:05.685-0500 event=org.dcache.ftp.hello session=door:GFTP0-fndca4b-AAXPv01F4lA@gridftp0-fndca4bDomain socket.remote=131.225.67.13:43624 socket.local=131.225.69.121:2811 reply="220 GSI FTP door ready"
level=INFO ts=2021-11-01T14:37:05.687-0500 event=org.dcache.ftp.response session=door:GFTP0-fndca4b-AAXPv01F4lA@gridftp0-fndca4bDomain command="AUTH GSSAPI" reply="334 ADAT must follow"
level=INFO ts=2021-11-01T14:37:05.702-0500 event=org.dcache.ftp.response session=door:GFTP0-fndca4b-AAXPv01F4lA@gridftp0-fndca4bDomain command="ADAT ..." reply="335 ADAT=..."
level=INFO ts=2021-11-01T14:37:05.709-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:43624 session=door:GFTP0-fndca4b-AAXPv01F4lA@gridftp0-fndca4bDomain
Some ideas/questions:
Excellent suggestions, Paul, as always. The Issue has been created as a place holder to go back to if team has no immediate ideas what that might be. It is bloodbath here at Fermilab since upgrade. Once I have a little more time to dedicate to this issue all these avenues will be explored.
doors=`zgrep -B1 'TooLongFrameException' gridftp0-fndca4bDomain.log-20211102.gz | grep GFTP | awk '{ print $5}' | sed -e 's/[\(\)]//g'`
[root@fndca4b dcache-log]# for d in $doors ; do zgrep "$d" gridftp0-fndca4bDomain.access.2021-11-01.gz | grep "host"; done
level=INFO ts=2021-11-01T21:00:05.423-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47088 session=door:GFTP0-fndca4b-AAXPxKb4KDA@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:00:23.555-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47090 session=door:GFTP0-fndca4b-AAXPxKgNGqA@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:00:24.592-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47092 session=door:GFTP0-fndca4b-AAXPxKgc5Zg@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:00:24.625-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47094 session=door:GFTP0-fndca4b-AAXPxKgdamg@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:00:37.765-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47100 session=door:GFTP0-fndca4b-AAXPxKjl4rg@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:02:25.089-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47114 session=door:GFTP0-fndca4b-AAXPxK9LgTA@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:03:56.306-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47116 session=door:GFTP0-fndca4b-AAXPxLS7UeA@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:03:57.339-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47118 session=door:GFTP0-fndca4b-AAXPxLTLNEg@gridftp0-fndca4bDomain
level=INFO ts=2021-11-01T21:03:57.375-0500 event=org.dcache.ftp.disconnect host.remote=131.225.67.13:47120 session=door:GFTP0-fndca4b-AAXPxLTLsUg@gridftp0-fndca4bDomain
same host.
There are successful transfers :
billing=# select sum(case when errorcode <> 0 then 1 else 0 end) as errors, sum(case when errorcode <> 0 then 0 else 1 end) as success, protocol from billinginfo where client='131.225.67.13' and datestamp > current_date - interval '2 days' group by protocol;
errors | success | protocol
--------+---------+------------
0 | 537 | NFS4-4.1
0 | 991 | Xrootd-5.0
(2 rows)
This seems like interactive node. I will ask user what are they doing.
After upgrade to 7.2 the following exception is seen. I do not believe I have seen it before.
It is always the same frame length : 84173 And it looks to be ~ 200 / day frequency.