NationalSecurityAgency / datawave

DataWave is an ingest/query framework that leverages Apache Accumulo to provide fast, secure data access.
https://code.nsa.gov/datawave
Apache License 2.0
563 stars 244 forks source link

Apparent race condition and thread-safety issue in InMemoryAccumulo #1099

Open keith-ratcliffe opened 3 years ago

keith-ratcliffe commented 3 years ago

There appears to be a race condition with at least one of InMemoryAccumulo's internal HashMaps, specifically the tables map in this case

Failed query...

URL:  https://dw:8443/DataWave/Query/9ed85f2d-d96d-4a8a-b75b-7b37dd7b2bba/next
Error:  {
    "Exceptions": [
        {
            "Message": "java.util.ConcurrentModificationException",
            "Code": "500-14",
            "Cause": "java.lang.RuntimeException:
                        java.util.ConcurrentModificationException"
        }
    ],
    "HasResults": false,
    "OperationTimeMS": 0,
    "PageNumber": 0,
    "PartialResults": false
}

First sign of trouble in Wildfly Query.log...

2021-03-03 19:18:01,396 ERROR [datawave.query.tables.async.Scan] (Datawave BatchScanner Session 9ed85f2d-d96d-4a8a-b75b-7b37dd7b\
2bba -51)  exception : java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
        at java.util.HashMap$KeyIterator.next(HashMap.java:1469)
        at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
        at java.util.TreeSet.addAll(TreeSet.java:312)
        at java.util.TreeSet.<init>(TreeSet.java:160)
        at datawave.accumulo.inmemory.InMemoryTableOperations.list(InMemoryTableOperations.java:89)
        at datawave.webservice.common.connection.WrappedAccumuloClient.createScanner(WrappedAccumuloClient.java:157)
        at datawave.security.util.ScannerHelper.createScanner(ScannerHelper.java:30)
        at datawave.query.tables.RunningResource.init(RunningResource.java:121)
        at datawave.query.tables.AccumuloResource$ResourceFactory.initializeResource(AccumuloResource.java:112)
        at datawave.query.tables.AccumuloResource$ResourceFactory.initializeResource(AccumuloResource.java:103)
        at datawave.query.tables.async.Scan.call(Scan.java:243)
        at datawave.query.tables.async.Scan.call(Scan.java:32)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(Trusted\
ListenableFutureTask.java:111)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
keith-ratcliffe commented 3 years ago

Linking issue on source repo: #3

keith-ratcliffe commented 3 years ago

As I suspected, server.log reveals the source of the race condition to be AccumuloTableCache reload, which occurred at the exact same time as as the ConcurrentModificationException above...

2021-03-03 19:18:01,027 INFO  [datawave.webservice.common.cache.AccumuloTableCache] (EJB default - 1)  Reloading datawave.metadata
2021-03-03 19:18:01,027 INFO  [datawave.webservice.common.cache.AccumuloTableCache] (EJB default - 1)  Reloading datawave.queryMetrics_m
2021-03-03 19:18:01,027 INFO  [datawave.webservice.common.cache.AccumuloTableCache] (EJB default - 1)  Reloading datawave.error_m
2021-03-03 19:18:01,345 INFO  [datawave.webservice.common.cache.BaseTableCache] (EE-ManagedExecutorService-default-Thread-12)  Cached 84 k,v for table: datawave.error_m
2021-03-03 19:18:01,396 INFO  [datawave.webservice.common.cache.BaseTableCache] (EE-ManagedExecutorService-default-Thread-16)  Cached 102 k,v for table: datawave.queryMetrics_m