dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

pool produces 140G logs in a few days until disk full #2399

Closed calestyo closed 8 years ago

calestyo commented 8 years ago

Hi.

Not sure if the following is helpful for you... dCache 2.15.3, a node which has just pool services and one openly available (ro) nfs door, that is however not used.

What happened was, that one of the pools filled up the system disk over the course of a few days; dCache itself seemed to continued running fine (or at least we had gotten no complaints from our users).

/var/log/dcache was:

total 148G
drwxr-xr-x 4 root   root   4,1k Apr 30 17:29 .
drwxr-xr-x 5 dcache dcache 4,1k Apr 30 16:55 ..
drwxr-xr-x 2 dcache dcache 4,1k Apr  4 00:17 access
drwxr-xr-x 2 dcache dcache 4,1k Apr  4 00:17 events
-rw-r--r-- 1 root   root    17k Apr 30 16:53 pool_lcg-lrz-dc67_0.log
-rw-r--r-- 1 root   root   2,0k Apr 30 16:53 pool_lcg-lrz-dc67_1.log
-rw-r--r-- 1 root   root   140G Apr 30 16:52 pool_lcg-lrz-dc67_2.log
-rw-r--r-- 1 root   root   7,7G Apr 22 07:38 pool_lcg-lrz-dc67_2.log.1
-rw-r--r-- 1 root   root   395k Apr 16 07:35 pool_lcg-lrz-dc67_2.log.2.xz
-rw-r--r-- 1 root   root   434k Apr 10 07:35 pool_lcg-lrz-dc67_2.log.3.xz
-rw-r--r-- 1 root   root    37k Apr 30 16:52 pool_lcg-lrz-dc67_3.log
-rw-r--r-- 1 root   root   1,8k Apr 30 16:52 pool_lcg-lrz-dc67_4.log
-rw-r--r-- 1 root   root    21k Apr 30 16:52 pool_lcg-lrz-dc67_5.log
-rw-r--r-- 1 root   root   1,8k Apr 30 16:52 pool_lcg-lrz-dc67_6.log
-rw-r--r-- 1 root   root   1,8k Apr 30 16:51 pool_lcg-lrz-dc67_7.log

The pool in question is apparently dc67_2,...

log.2.xz is basically full with these:

2016-04-10 07:35:49+02:00 (System) [info] Message arrived : <CM: S=[>info@info:*@info:*@core];D=[>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1460266549603:13836992>;LO=<1460266549603:13836991>;TTL=1000>
2016-04-10 07:35:49+02:00 (System) [info] Command: show context info.static
2016-04-10 07:35:49+02:00 (System) [info] Reply : S,user.name,dcache
S,user.language,en
S,user.timezone,Europe/Berlin
S,user.country,DE
S,os.version,3.16.0-4-amd64
S,os.name,Linux
S,os.arch,amd64
S,java.version,1.8.0_72-internal
S,java.vendor,Oracle Corporation
S,java.class-version,52.0
S,java.runtime-version,1.8.0_72-internal-b15
S,java.specification-version,1.8
S,java.vm.version,25.72-b15
S,java.vm.vendor,Oracle Corporation
S,java.vm.name,OpenJDK 64-Bit Server VM
S,java.vm.specification-version,1.8
S,java.vm.info,mixed mode

2016-04-10 07:35:49+02:00 (System) [info] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:*@info:info@info];C=java.lang.String;O=<1460266549605:123121>;LO=<1460266549603:13836992>;TTL=1000>
2016-04-10 07:35:52+02:00 (System) [topo] Message arrived : <CM: S=[>topo@core:System@core:*@core];D=[System@core:>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1460266552346:2930582>;LO=<1460266552346:2930581>;TTL=20000>
2016-04-10 07:35:52+02:00 (System) [topo] Command: getcelltunnelinfos
2016-04-10 07:35:52+02:00 (System) [topo] Reply : [c-core-AAUvm_ohukA-AAUvm_omcUg L[pool_lcg-lrz-dc67_2];R[core]]
2016-04-10 07:35:52+02:00 (System) [topo] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:System@core:topo@core];C=[Ldmg.cells.nucleus.CellTunnelInfo;;O=<1460266552346:123124>;LO=<1460266552346:2930582>;TTL=20000>
2016-04-10 07:35:52+02:00 (System) [topo] Message arrived : <CM: S=[>topo@core:System@core:*@core];D=[System@core:>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1460266552357:2931182>;LO=<1460266552357:2931181>;TTL=20000>
2016-04-10 07:35:52+02:00 (System) [topo] Command: get hostname
2016-04-10 07:35:52+02:00 (System) [topo] Reply : lcg-lrz-dc67.grid.lrz.de
2016-04-10 07:35:52+02:00 (System) [topo] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:System@core:topo@core];C=java.lang.String;O=<1460266552363:123125>;LO=<1460266552357:2931182>;TTL=20000>
2016-04-10 07:35:57+02:00 (System) [info] Message arrived : <CM: S=[>info@info:*@info:*@core];D=[>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1460266557783:13837270>;LO=<1460266557783:13837269>;TTL=1000>
2016-04-10 07:35:57+02:00 (System) [info] Command: getcellinfos
2016-04-10 07:35:57+02:00 (System) [info] Reply : [lm                  A 0  1  LocationManager     ClientReady, c-core-AAUvm_ohukA-AAUvm_omcUgA 0  2  LocationMgrTunnel   Connected to core, RoutingMgr          A 0  0  RoutingManager      RoutingMgr, System              A 0  1  SystemCell          pool_lcg-lrz-dc67_2:IOrec=12688;IOexc=0;MEM=135436600, c-core-AAUvm_ohukA  A 0  1  LocationManagerConnectorConnected, lcg-lrz-dc67_2      A 0  53 Pool                lcg-lrz-dc67_2]
2016-04-10 07:35:57+02:00 (System) [info] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:*@info:info@info];C=[Ldmg.cells.nucleus.CellInfo;;O=<1460266557785:123126>;LO=<1460266557783:13837270>;TTL=1000>
2016-04-10 07:37:49+02:00 (System) [info] Message arrived : <CM: S=[>info@info:*@info:*@core];D=[>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1460266669662:13839779>;LO=<1460266669662:13839778>;TTL=1000>
2016-04-10 07:37:49+02:00 (System) [info] Command: show context info.static
2016-04-10 07:37:49+02:00 (System) [info] Reply : S,user.name,dcache
S,user.language,en
S,user.timezone,Europe/Berlin
S,user.country,DE
S,os.version,3.16.0-4-amd64
S,os.name,Linux
S,os.arch,amd64
S,java.version,1.8.0_72-internal
S,java.vendor,Oracle Corporation
S,java.class-version,52.0
S,java.runtime-version,1.8.0_72-internal-b15
S,java.specification-version,1.8
S,java.vm.version,25.72-b15
S,java.vm.vendor,Oracle Corporation
S,java.vm.name,OpenJDK 64-Bit Server VM
S,java.vm.specification-version,1.8
S,java.vm.info,mixed mode

that continues until at some point in log.1, the following starts:

2016-04-21 15:11:08+02:00 (System) [info] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:*@info:info@info];C=java.lang.String;O=<1461244268441:344559>;LO=<1461244268442:38677659>;
TTL=1000>
2016-04-21 15:11:09+02:00 (System) [topo] Message arrived : <CM: S=[>topo@core:System@core:*@core];D=[System@core:>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1461244269147:8015346>;LO
=<1461244269147:8015345>;TTL=20000>
2016-04-21 15:11:09+02:00 (System) [topo] Command: getcelltunnelinfos
2016-04-21 15:11:09+02:00 (System) [topo] Reply : [c-core-AAUvm_ohukA-AAUvm_omcUg L[pool_lcg-lrz-dc67_2];R[core]]
2016-04-21 15:11:09+02:00 (System) [topo] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:System@core:topo@core];C=[Ldmg.cells.nucleus.CellTunnelInfo;;O=<1461244269146:344560>;LO=<
1461244269147:8015346>;TTL=20000>
2016-04-21 15:11:09+02:00 (System) [topo] Message arrived : <CM: S=[>topo@core:System@core:*@core];D=[System@core:>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1461244269158:8015946>;LO
=<1461244269158:8015945>;TTL=20000>
2016-04-21 15:11:09+02:00 (System) [topo] Command: get hostname
2016-04-21 15:11:09+02:00 (System) [topo] Reply : lcg-lrz-dc67.grid.lrz.de
2016-04-21 15:11:09+02:00 (System) [topo] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:System@core:topo@core];C=java.lang.String;O=<1461244269187:344561>;LO=<1461244269158:80159
46>;TTL=20000>
2016-04-21 15:12:30+02:00 (System) [info] Message arrived : <CM: S=[>info@info:*@info:*@core];D=[>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1461244350832:38679480>;LO=<1461244350832:
38679479>;TTL=1000>
2016-04-21 15:12:30+02:00 (System) [info] Command: getcellinfos
2016-04-21 15:12:30+02:00 (System) [info] Reply : [lm                  A 0  1  LocationManager     ClientReady, c-core-AAUvm_ohukA-AAUvm_omcUgA 0  2  LocationMgrTunnel   Connected to core, R
outingMgr          A 0  0  RoutingManager      RoutingMgr, System              A 0  1  SystemCell          pool_lcg-lrz-dc67_2:IOrec=35481;IOexc=0;MEM=80481296, c-core-AAUvm_ohukA  A 0  1  L
ocationManagerConnectorConnected, lcg-lrz-dc67_2      A 0  54 Pool                lcg-lrz-dc67_2]
2016-04-21 15:12:30+02:00 (System) [info] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:*@info:info@info];C=[Ldmg.cells.nucleus.CellInfo;;O=<1461244350833:344578>;LO=<14612443508
32:38679480>;TTL=1000>
2016-04-21 15:13:08+02:00 (System) [info] Message arrived : <CM: S=[>info@info:*@info:*@core];D=[>System@pool_lcg-lrz-dc67_2];C=java.lang.String;O=<1461244388503:38680446>;LO=<1461244388503:
38680445>;TTL=1000>
2016-04-21 15:13:08+02:00 (System) [info] Command: show context info.static
2016-04-21 15:13:08+02:00 (System) [info] Reply : S,user.name,dcache
S,user.language,en
S,user.timezone,Europe/Berlin
S,user.country,DE
S,os.version,3.16.0-4-amd64
S,os.name,Linux
S,os.arch,amd64
S,java.version,1.8.0_72-internal
S,java.vendor,Oracle Corporation
S,java.class-version,52.0
S,java.runtime-version,1.8.0_72-internal-b15
S,java.specification-version,1.8
S,java.vm.version,25.72-b15
S,java.vm.vendor,Oracle Corporation
S,java.vm.name,OpenJDK 64-Bit Server VM
S,java.vm.specification-version,1.8
S,java.vm.info,mixed mode
2016-04-21 15:13:08+02:00 (System) [info] Sending : <CM: S=[>System@pool_lcg-lrz-dc67_2];D=[>*@core:*@info:info@info];C=java.lang.String;O=<1461244388502:344587>;LO=<1461244388503:38680446>;
TTL=1000>
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] -Dio.netty.initialSeedUniquifier: 0xf6227e4b9b88b7e9 (took 0 ms)
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] -Dio.netty.allocator.type: unpooled
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] -Dio.netty.threadLocalDirectBufferSize: 65536
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] -Dio.netty.maxThreadLocalCharBufferSize: 16384
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] Loopback interface: lo (lo, 0:0:0:0:0:0:0:1%lo)
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] /proc/sys/net/core/somaxconn: 128
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] Started HttpTransferService on /0:0:0:0:0:0:0:0:62880
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [door:webdav.tls_lcg-lrz-dc14:AAUw/nx0sRA webdav.tls_lcg-lrz-dc14 PoolDeliverFile 000075AAAE2F13B74E34A1422C473E13CF18] Sending redirect URI http://lcg-lrz-dc67.grid.lrz.de:62880/pnfs/lrz-muenchen.de/data/atlas/dq2/atlasdatadisk/rucio/data15_hi/b2/b9/data15_hi.00287378.physics_HardProbes.daq.RAW._lb0410._SFO-1._0002.data?dcache-http-uuid=8321a12c-9a7b-4bd6-aff0-fc3bf61be3a0 to webdav.tls_lcg-lrz-dc14@webdav_lcg-lrz-dc14
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] -Dio.netty.buffer.bytebuf.checkAccessible: true
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] -Dio.netty.leakDetection.level: simple
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] -Dio.netty.leakDetection.maxRecords: 4
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] [id: 0x73a3e9b6, /129.187.239.202:53701 => /129.187.131.67:62880] REGISTERED
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] [id: 0x73a3e9b6, /129.187.239.202:53701 => /129.187.131.67:62880] ACTIVE
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] HTTP connection from /129.187.239.202:53701 established
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] -Dio.netty.recycler.maxCapacity.default: 262144
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] java.nio.ByteBuffer.cleaner(): available
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] [id: 0x73a3e9b6, /129.187.239.202:53701 => /129.187.131.67:62880] RECEIVED: DefaultHttpRequest(decodeResult: success, version: HTTP/1.1)
GET http://lcg-lrz-dc67.grid.lrz.de:62880/pnfs/lrz-muenchen.de/data/atlas/dq2/atlasdatadisk/rucio/data15_hi/b2/b9/data15_hi.00287378.physics_HardProbes.daq.RAW._lb0410._SFO-1._0002.data?dcache-http-uuid=8321a12c-9a7b-4bd6-aff0-fc3bf61be3a0 HTTP/1.1
Host: lcg-lrz-dc67.grid.lrz.de:62880
Connection: keep-alive
range: bytes=0-1048575
user-agent: ARC
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] [id: 0x73a3e9b6, /129.187.239.202:53701 => /129.187.131.67:62880] WRITE: HttpPoolRequestHandler$HttpPartialContentResponse(decodeResult: success, version: HTTP/1.1)
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Length: 1048576
Content-Range: bytes 0-1048575/2621156300
Digest: adler32=3968a9a9
Server: dCache/2.15.3
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] [id: 0x73a3e9b6, /129.187.239.202:53701 => /129.187.131.67:62880] FLUSH
2016-04-21 15:14:11+02:00 (lcg-lrz-dc67_2) [] [id: 0x73a3e9b6, /129.187.239.202:53701 => /129.187.131.67:62880] WRITE: 8192B
         +-------------------------------------------------+
         |  0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f |
+--------+-------------------------------------------------+----------------+
|00000000| aa aa 34 12 08 00 00 00 06 00 00 00 02 00 00 00 |..4.............|
|00000010| 1f 6a 5d 00 42 6a 03 00 00 00 00 00 c4 09 00 00 |.j].Bj..........|
|00000020| bb aa 34 12 05 00 00 00 53 46 4f 2d 31 20 20 20 |..4.....SFO-1   |
|00000030| 3c 00 00 00 64 61 74 61 31 35 5f 68 69 2e 30 30 |<...data15_hi.00|
|00000040| 32 38 37 33 37 38 2e 70 68 79 73 69 63 73 5f 48 |287378.physics_H|
|00000050| 61 72 64 50 72 6f 62 65 73 2e 64 61 71 2e 52 41 |ardProbes.daq.RA|
|00000060| 57 2e 5f 6c 62 30 34 31 30 2e 5f 53 46 4f 2d 31 |W._lb0410._SFO-1|
|00000070| bc aa 34 12 04 00 00 00 29 00 00 00 47 55 49 44 |..4.....)...GUID|
|00000080| 3d 38 41 32 32 36 45 41 35 2d 36 31 39 43 2d 45 |=8A226EA5-619C-E|
|00000090| 35 31 31 2d 41 32 41 43 2d 34 34 41 38 34 32 30 |511-A2AC-44A8420|
|000000a0| 41 37 37 37 31 20 20 20 19 00 00 00 53 74 72 65 |A7771   ....Stre|
|000000b0| 61 6d 3d 70 68 79 73 69 63 73 5f 48 61 72 64 50 |am=physics_HardP|
|000000c0| 72 6f 62 65 73 20 20 20 11 00 00 00 50 72 6f 6a |robes   ....Proj|
|000000d0| 65 63 74 3d 64 61 74 61 31 35 5f 68 69 20 20 20 |ect=data15_hi   |
|000000e0| 0d 00 00 00 4c 75 6d 69 42 6c 6f 63 6b 3d 34 31 |....LumiBlock=41|
|000000f0| 30 20 20 20 bb bb 34 12 0c 00 00 00 92 62 04 00 |0   ..4......b..|
|00000100| 00 00 00 00 01 00 00 00 00 00 00 00 f7 ff ff ff |................|
|00000110| ff 69 01 41 00 00 00 00 00 00 00 00 02 00 00 00 |.i.A............|
|00000120| e1 18 00 00 cc cc 34 12 04 00 00 00 2c 04 00 00 |......4.....,...|
|00000130| 98 8d 2d 00 aa 34 12 aa 66 63 0b 00 61 00 00 00 |..-..4..fc..a...|
|00000140| 00 00 00 05 00 00 7c 00 01 00 00 00 00 00 00 00 |......|.........|
|00000150| 00 00 00 00 c0 aa 64 56 d5 a2 43 35 1c 2d 85 0f |......dV..C5.-..|
|00000160| 00 00 00 00 00 00 00 00 92 62 04 00 9a 01 00 00 |.........b......|
|00000170| 8a 85 00 73 91 03 00 00 84 00 00 00 01 00 00 00 |...s............|
|00000180| c0 2e 10 00 30 00 00 00 0c ba 82 07 00 05 7c 95 |....0.........|.|
|00000190| 02 10 4d 00 ff 00 00 00 d0 47 1e 1e 10 00 01 e0 |..M......G......|
|000001a0| 03 00 f0 c0 15 00 06 00 02 00 00 00 00 00 93 4c |...............L|
|000001b0| c1 3b cf 03 00 44 04 00 00 c0 80 05 00 00 00 00 |.;...D..........|
|000001c0| 00 00 3a 01 00 60 00 00 0c b8 80 05 00 05 7c 15 |..:..`........|.|
|000001d0| 02 10 4d 00 ff 00 00 00 90 47 1e 1e 10 00 01 00 |..M......G......|
|000001e0| 02 00 f0 c0 05 00 06 00 00 00 00 00 00 00 03 4c |...............L|
|000001f0| c1 3b cd 03 00 44 04 00 00 00 00 00 00 00 00 00 |.;...D..........|
|00000200| 00 00 00 01 00 60 00 00 08 00 00 00 00 00 00 00 |.....`..........|
|00000210| 00 00 00 00 02 00 00 00 00 44 00 00 00 00 00 00 |.........D......|
|00000220| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 0c |................|
|00000230| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|00000240| 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 |................|
|00000250| 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 |................|
|00000260| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|00000270| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|00000280| 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
|00000290| 09 00 00 00 6e 61 6d 65 3d 48 61 72 64 50 72 6f |....name=HardPro|
|000002a0| 62 65 73 3b 74 79 70 65 3d 70 68 79 73 69 63 73 |bes;type=physics|
|000002b0| 3b 6c 75 6d 69 3d 31 00 78 01 6c bc 47 8f 1d c9 |;lumi=1.x.l.G...|
|000002c0| d6 2d 96 3e 22 d2 e7 b1 e5 eb 94 f7 86 64 59 7a |.-.>"........dYz|
|000002d0| 6f 8b ae e9 6d b3 e8 c9 6e 7a b2 9b 9e e9 bd a1 |o...m...nz......|
|000002e0| 27 8b c5 62 09 d0 f7 f4 24 68 ac 37 79 33 41 93 |'..b....$h.7y3A.|
|000002f0| 37 d1 58 93 6f ac 89 f0 20 40 7f 40 bb ef e5 c5 |7.X.o... @.@....|
|00000300| 77 07 97 c0 c2 ce 3c a7 1a 68 9c c8 d8 b1 f7 5a |w.....<..h.....Z|
|00000310| 6b e7 bf 4f 95 fe fd ff 14 28 8a 50 7f fb c7 f3 |k..O.....(.P....|
|00000320| 75 83 62 fe 7e 4d 5d f8 af d4 23 1a ae ff fb 54 |u.b.~M]...#....T|
|00000330| e9 bf ff fd 7b 9a fd eb fb f7 97 39 2a f6 a8 47 |....{......9*..G|
|00000340| ef 58 8a 72 7f fe ed 77 49 6f fc 42 97 cc 5f 18 |.X.r...wIo.B.._.|
|00000350| 6c 1e 63 66 cd 4f 44 30 3f 13 6c 7e 95 74 f3 8b |l.cf.OD0?.l~.t..|
|00000360| d8 62 7e 16 47 cc 27 0a 6f be d7 54 93 01 bc d3 |.b~.G.'.o..T....|
|00000370| ea e6 3b b5 cb 7c a2 0e 9b 8c 4e cc 63 46 cd 64 |..;..|....N.cF.d|
|00000380| f5 21 f3 18 d5 65 f1 bc 6a 09 7c 87 25 08 c8 e2 |.!...e..j.|.%...|
|00000390| 05 6c bd 57 74 eb bd 3a 66 a9 34 e7 20 aa e4 88 |.l.Wt..:f.4. ...|
|000003a0| 74 d9 21 54 c5 a1 01 98 6a 75 64 ba cd e1 a9 36 |t.!T....jud....6|
|000003b0| 47 a1 7b 1c 0e 22 4b 35 39 0c d5 e2 50 10 25 7a |G.{.."K59...P.%z|
|000003c0| cc 11 a8 01 67 92 d5 dc 09 46 75 27 d8 56 b7 5b |....g....Fu'.V.[|
|000003d0| 91 dc 6e b9 e2 f6 28 73 ee 3c c6 fe 7a d2 ef ef |..n...(s.<..z...|
|000003e0| 51 14 7f b7 22 f9 7b d4 09 ff b6 81 83 df 0d 39 |Q...".{........9|
|000003f0| f8 cd 30 02 e3 2f e8 cd c1 67 8a 0f af 52 f5 f0 |..0../...g...R..|
|00000400| 0b 5d 0b bf 50 ed 61 20 94 c3 00 75 86 bd 94 18 |.]..P.a ...u....|
|00000410| f5 d2 03 d1 0a 35 9a 64 b8 9e e4 78 63 b2 43 1a |.....5.d...xc.C.|
|00000420| 4b 77 c8 62 ba a8 0b d9 65 1d 67 97 00 bf 02 ce |Kw.b....e.g.....|
|00000430| eb 24 bb a8 57 b2 0b 7a 5b 76 89 e3 f3 eb 1c ce |.$..W..z[v......|
|00000440| af 01 ae 02 2e 73 62 7e 85 93 f3 45 40 1b 5b 2a |.....sb~...E@.[*|
|00000450| da 98 81 e2 9e 52 2d ee 69 a8 b8 ab aa c5 5d 6d |.....R-.i.....]m|
|00000460| ac b8 a7 6e 2b ee e9 72 71 57 af 17 f0 b3 6f fd |...n+..rqW....o.|
|00000470| 2e 19 8d 8b 8c 64 5e 64 a7 cc 8b 9c 61 2e 21 c1 |.....d^d....a.!.|
|00000480| dc 26 aa e6 0e 49 33 b7 8b ba b9 53 aa 99 ab 62 |.&...I3....S...b|
|00000490| d5 5c 16 3b cc ef e2 94 f9 43 3c 68 ae 00 76 c9 |.\.;.....C<h..v.|
|000004a0| aa b9 57 d1 cd 3d 72 c9 dc 2d 1b e6 3e a5 d9 dc |..W..=r..-..>...|
|000004b0| af cc 98 0b 9a 60 ee 57 65 f3 80 5a 33 17 60 5d |.....`.We..Z3.`]|
|000004c0| 76 19 ad 56 48 63 db a3 9a 6d 9f 6e b5 77 51 03 |v..VHc...m.n.wQ.|
|000004d0| f6 6e 6a cc 0e e8 75 f6 3c 20 64 90 fd 83 55 ec |.nj...u.< d...U.|
|000004e0| 84 35 ec 88 a9 da 29 5b b3 63 a6 66 cf 33 ed 76 |.5....)[.c.f.3.v|
|000004f0| c6 a9 f6 3b 5e b5 df 72 25 bb e0 15 3b e7 ca f6 |...;^..r%...;...|
|00000500| 23 5e b7 0b ae 66 3f e4 ab f6 5b be c3 7e c7 4d |#^...f?...[..~.M|
|00000510| d8 0f b9 71 fb 93 28 d9 5f 45 c3 fe 2c ce da 5f |...q..(._E..,.._|
|00000520| c4 39 db 53 90 6d 2b c4 f6 15 d5 76 15 cd 76 14 |.9.S.m+....v..v.|
|00000530| d9 36 35 d1 b6 b4 aa 3d a6 73 f6 88 c1 db a6 2e |.65....=.s......|
|00000540| d8 c3 06 b2 47 75 62 0f 18 8a dd 6f 60 7b c8 18 |....Gub....o`{..|
|00000550| b1 07 8d 8d b6 a5 6f b5 eb 94 e0 54 29 ec d4 28 |......o....T)..(|
|00000560| d9 a9 c0 73 b0 9e 21 ce 06 a6 ea bc e0 b0 f3 92 |...s..!.........|

but due to the size I didn't really go through it, whether anything else of interest or changed patters are contained. Obviously the hexdumps seem to be files stored i dCache.

.Iog is basically full hexdumps as above but I also found the other messages I've had posted further above... again I didn't read through al the 140G (got tired at 120G ;) )... so there may be loads of other stuff in it which I just didn't see.

Logging settings in dCache are at default, and my colleagues promised me not to have played around without my knowledge...

A restart of dCache on that node seemed to have cured the issue (so far).

I still have all the logs, if you should want them I could in principle make them available for download, but then please tell me yes or no, because I'd rather like to delete them.

Cheers, Chris.

gbehrmann commented 8 years ago

Very odd. This is usually what I expect if the logback.xml file is tampered with. Since a restart cured it, this seems not to have been the case.

Probably nothing I can do to help investigate it. Should it ever happen again, a heap dump of the domain would be helpful.

Cheers,

/gerd

calestyo commented 8 years ago

Uhm.. I do have a non default logback (remembering now, that you say it),... but it's not really special, just changing some file paths... (attached it)... and why would the issue then only happen on one node (all have the same logback.xml) logback.xml.txt

Well I just wanted to tell you... I'm fine if we close it.

gbehrmann commented 8 years ago

Non default is okay as long as one remembers to adjust it if the default one got structural changes. Since a restart resolved it, I do not suspect this to be the problem in this case.