dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
291 stars 136 forks source link

java.util.ConcurrentModificationException under high concurrent writing load #6284

Open kofemann opened 2 years ago

kofemann commented 2 years ago

With current master (9d3a156cdd) under 32 writers on a single pool I got an exception:

22 Nov 2021 17:29:35 (pool_write) [] DSWRITE: 
java.util.ConcurrentModificationException: null
    at java.base/java.util.HashMap.computeIfAbsent(HashMap.java:1135)
    at org.dcache.pool.repository.FileTrackingAccount.adjustFileUsed(FileTrackingAccount.java:258)
    at org.dcache.pool.repository.FileTrackingAccount.allocateNow(FileTrackingAccount.java:295)
    at org.dcache.pool.classic.ImmediateAllocator.allocate(ImmediateAllocator.java:37)
    at org.dcache.pool.repository.ForwardingAllocator.allocate(ForwardingAllocator.java:32)
    at org.dcache.pool.classic.FairQueueAllocator.allocate(FairQueueAllocator.java:67)
    at org.dcache.pool.repository.AllocatorAwareRepositoryChannel.allocate(AllocatorAwareRepositoryChannel.java:239)
    at org.dcache.pool.repository.AllocatorAwareRepositoryChannel.preallocate(AllocatorAwareRepositoryChannel.java:226)
    at org.dcache.pool.repository.AllocatorAwareRepositoryChannel.write(AllocatorAwareRepositoryChannel.java:216)
    at org.dcache.pool.movers.MoverChannel.write(MoverChannel.java:162)
    at org.dcache.chimera.nfsv41.mover.EDSOperationWRITE.process(EDSOperationWRITE.java:53)
    at org.dcache.nfs.v4.AbstractOperationExecutor.execute(AbstractOperationExecutor.java:58)
    at org.dcache.chimera.nfsv41.common.StatsDecoratedOperationExecutor.execute(StatsDecoratedOperationExecutor.java:56)
    at org.dcache.nfs.v4.NFSServerV41.NFSPROC4_COMPOUND_4(NFSServerV41.java:188)
    at org.dcache.nfs.v4.xdr.nfs4_prot_NFS4_PROGRAM_ServerStub.dispatchOncRpcCall(nfs4_prot_NFS4_PROGRAM_ServerStub.java:48)
    at org.dcache.oncrpc4j.rpc.RpcDispatcher$1.run(RpcDispatcher.java:110)
    at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:258)
    at org.dcache.oncrpc4j.rpc.RpcDispatcher.handleRead(RpcDispatcher.java:91)
    at org.glassfish.grizzly.filterchain.ExecutorResolver$9.execute(ExecutorResolver.java:119)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeFilter(DefaultFilterChain.java:284)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.executeChainPart(DefaultFilterChain.java:201)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.execute(DefaultFilterChain.java:133)
    at org.glassfish.grizzly.filterchain.DefaultFilterChain.process(DefaultFilterChain.java:112)
    at org.glassfish.grizzly.ProcessorExecutor.execute(ProcessorExecutor.java:77)
    at org.glassfish.grizzly.nio.transport.TCPNIOTransport.fireIOEvent(TCPNIOTransport.java:539)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.fireIOEvent(AbstractIOStrategy.java:112)
    at org.glassfish.grizzly.strategies.SameThreadIOStrategy.executeIoEvent(SameThreadIOStrategy.java:103)
    at org.glassfish.grizzly.strategies.AbstractIOStrategy.executeIoEvent(AbstractIOStrategy.java:89)
    at org.glassfish.grizzly.nio.SelectorRunner.iterateKeyEvents(SelectorRunner.java:415)
    at org.glassfish.grizzly.nio.SelectorRunner.iterateKeys(SelectorRunner.java:384)
    at org.glassfish.grizzly.nio.SelectorRunner.doSelect(SelectorRunner.java:348)
    at org.glassfish.grizzly.nio.SelectorRunner.run(SelectorRunner.java:279)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:593)
    at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:573)
    at java.base/java.lang.Thread.run(Thread.java:829)
kofemann commented 2 years ago

The call to FileTrackingAccount#checkForRemovals is performed by ScheduledFuture without synchronization.