Closed ansell closed 2 years ago
The Mocks
class used for testing ignores java.lang.Error
and does not fail tests after attempting to create an ALANameSearcher object and failing:
src/test/scala/au/org/ala/biocache/Mocks.scala-25- try {
src/test/scala/au/org/ala/biocache/Mocks.scala-26- val nameIndex = new au.org.ala.names.search.ALANameSearcher(properties.getProperty("name.index.dir"))
src/test/scala/au/org/ala/biocache/Mocks.scala-27- bind(classOf[au.org.ala.names.search.ALANameSearcher]).toInstance(nameIndex)
src/test/scala/au/org/ala/biocache/Mocks.scala-28- } catch {
src/test/scala/au/org/ala/biocache/Mocks.scala:29: case e: Exception => e.printStackTrace()
src/test/scala/au/org/ala/biocache/Mocks.scala-30- }
The MigrateMedia
class does not retry media saving when failures occur:
src/main/scala/au/org/ala/biocache/tool/MigrateMedia.scala-47- try {
src/main/scala/au/org/ala/biocache/tool/MigrateMedia.scala-48- Config.mediaStore.save(uuid, dataResourceUid, "file://" + filePath, None)
src/main/scala/au/org/ala/biocache/tool/MigrateMedia.scala-49- } catch {
src/main/scala/au/org/ala/biocache/tool/MigrateMedia.scala:50: case e: Exception => println(s"Problem saving $uuid, $dataResourceUid, $filePath")
src/main/scala/au/org/ala/biocache/tool/MigrateMedia.scala-51- }
The IndexLocalRecords
class does not retry indexing of records when failures occur:
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-115- indexer.indexFromMap(guid, map, batchID = tokenRangeIdx)
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-116-
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-117-
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-118-
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-119- count += 1
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-120- if(count % 100000 == 0){
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-121- ZookeeperUtil.setStatus("INDEXING", "INDEXING", count)
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-122- }
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-123- } catch {
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala:124: case e: Exception => {
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-125- logger.error("Problem indexing record: " + guid + " - " + e.getMessage(), e)
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-126- }
src/main/scala/au/org/ala/biocache/tool/IndexLocalRecords.scala-127- }
The Sampling
class does not retry loading of points from the file when failures occur:
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-252- logger.info("Loading points from file: " + filePath)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-253- var current: Array[String] = csvReader.readNext
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-254- val points: ArrayBuffer[Array[Double]] = new ArrayBuffer[Array[Double]]
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-255- var count = 0
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-256- while (current != null && count < max) {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-257- try {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-258- if(current.length == 2 && current(0) != "null" && current(0) != "null"){
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-259- points += current.map(x => x.toDouble)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-260- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-261- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:262: case e: Exception => logger.error("Error reading point: " + current, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-263- }
The Sampling
class does not fail when writing points to a file fails (and also doesn't close the FileWriter
object when failures occur and are silently ignored).
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-381- try {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-382- val fw = new FileWriter(locFilePath)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-383- coordinates.foreach { c =>
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-384- fw.write(c)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-385- fw.write("\n")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-386- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-387- fw.flush
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-388- fw.close
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-389- logger.info(s"Finished writing to file: $locFilePath")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-390- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:391: case e: Exception => logger.error("Failed to write - " + e.getMessage, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-392- }
and another version of the same pattern at:
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-405- try {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-406- val fw = new FileWriter(locFilePath)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-407- lce.coordinates.foreach { c =>
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-408- fw.write(c)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-409- fw.write("\n")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-410- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-411- fw.flush
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-412- fw.close
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-413- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:414: case e: Exception => logger.error(e.getMessage, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-415- }
The Sampling
class does not fail when reading from a CSV file and writing to the loc
table (and also does not consistently close the CSV reader object if errors occur in csvReader.readNext
):
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-618- if (counter % 200 == 0 && callback != null) {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-619- callback.setCurrentLayer(new IntersectionFile("", "", "", "finished. Processing loaded samples " + counter, "", "", "", "", null))
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-620- callback.progressMessage("Loading sampling.")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-621- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-622- if (counter % 1000 == 0) {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-623- logger.info(s"writing to loc: $counter : records per sec: " + 1000f / (((System.currentTimeMillis - nextTime).toFloat) / 1000f))
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-624- nextTime = System.currentTimeMillis
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-625- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-626- counter += 1
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-627- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:628: case e: Exception => {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-629- logger.error(e.getMessage, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-630- logger.error("Problem writing line: " + counter + ", line length: " + line.length + ", header length: " + header.length)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-631- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-632- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-633- line = csvReader.readNext
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-634- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-635- csvReader.close
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-636- if (batches.size > 0) {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-637- LocationDAO.writeLocBatch(batches.toMap)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-638- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-639- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:640: case e: Exception => {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-641- logger.error("Problem loading sampling. " + e.getMessage, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-642- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-643- }
The Sampling
class does not propagate failure when LocColumnExporter
exits without successfully completing its tasks:
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-712- "verbatimLatitude",
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-713- "verbatimLongitude",
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-714- "originalDecimalLatitude",
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-715- "originalDecimalLongitude",
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-716- "originalSensitiveValues")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-717-
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-718- val fin = System.currentTimeMillis
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-719- logger.info("[Unique Coordinate Thread " + threadId + "] " + counter + " took " + ((fin - start).toFloat) / 1000f + " seconds")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-720- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-721- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:722: case e: Exception => logger.error("Error reading points for thread: " + threadId, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-723- }
The Sampling
class does not propagate failure when the LoadSamplingConsumer
fails and does not preserve the Thread
interrupted status when InterruptedException
is caught and swallowed, and the CSV reader object is not consistently closed when issues occur:
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-766- batches += LocationDAO.addLayerIntersects(line(1), line(0), cl, el, true)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-767-
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-768- if (counter % 1000 == 0 && counter > 0) {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-769- logger.info("writing to loc:" + counter + ": records per sec: " + 1000f / (((System.currentTimeMillis - nextTime).toFloat) / 1000f))
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-770- nextTime = System.currentTimeMillis
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-771- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-772- counter += 1
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-773- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:774: case e: Exception => {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-775- logger.error(e.getMessage, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-776- logger.error("Problem writing line: " + counter + ", line length: " + line.length + ", header length: " + header.length)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-777- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-778- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-779- line = csvReader.readNext
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-780- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-781- csvReader.close
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-782- if (batches.size > 0) {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-783- LocationDAO.writeLocBatch(batches.toMap)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-784- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-785- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala:786: case e: Exception => {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-787- logger.error("Problem loading sampling. " + e.getMessage, e)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-788- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-789- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-790- logger.info("Finished loading: " + inputFileName + " in " + (System.currentTimeMillis - startTime) + "ms")
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-791-
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-792- doneList.add(inputFileName)
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-793- }
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-794- } catch {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-795- case e: InterruptedException => {
src/main/scala/au/org/ala/biocache/tool/Sampling.scala-796- //InterruptedException is expected when the thread is terminated with an interruption.
The ExpertDistributionOutlierTool
silently succeeds when it fails to perform database operations, and can produce partial results after silently succeeding:
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-524- logger.warn(rowKey + " is no longer an outlier")
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-525- Config.persistenceManager.deleteColumns(rowKey, "occ", "distanceOutsideExpertRange" + Config.persistenceManager.fieldDelimiter + "p")
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-526- Config.occurrenceDAO.removeSystemAssertion(rowKey, AssertionCodes.SPECIES_OUTSIDE_EXPERT_RANGE)
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-527- rowKeysForIndexing += rowKey
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-528- }
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-529- } else {
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-530- //add this row key to the "oldKeys"
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-531- newOutlierRowKeys ++= oldRowKeys
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-532- }
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-533- } catch {
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala:534: case e: Exception => logger.error("Unable to get parse past distribution ingto array: " + oldRowKeysJson + " for " + lsid, e)
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-535- }
src/main/scala/au/org/ala/biocache/tool/ExpertDistributionOutlierTool.scala-536- }
The DuplicationDetection
tool silently succeeds and does not persist duplicate details when database or other failures occur:
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-618- }
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-619-
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-620- saveToOccurrenceRecord(primaryRecord, r, types)
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-621-
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-622- buffer.synchronized {
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-623- buffer += r.rowKey
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-624- }
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-625- }
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-626- }
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-627- } catch {
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala:628: case e: Exception => e.printStackTrace();
src/main/scala/au/org/ala/biocache/tool/DuplicationDetection.scala-629- }
thanks @ansell
Can you point out where java.lang.Error instances are wrapped inside of other Exception instances such as java.util.concurrent.ExecutionException ?
I cant see these.
@djtfmartin The patterns you need to look for specifically in terms of ExecutionException
are any places where ExecutorService
is used, particularly those places where the Callable
or Consumer
/Producer
patterns are used.
I am compiling all of the catch case e: Exception
instances (due to some syntax differences, the number quoted above isn't the full number, as my initial search didn't include catch case e:Exception
). They will be included in the list when it is complete. If you can't see issues in the code shown in the comments above so far, then I may need to enhance the descriptions, as they all include bugs that need to be fixed. I am working on a biocache-service bug today, but will get back to this as soon as possible.
In general though, it is impossible to answer your question fully just from a static code review, as there is AOP (both Scala driven and other library driven) that adds and modifies code paths at runtime which may also introduce new wrappers. As part of fixing this widespread problem, biocache-store also needs significant software engineering work, including automated tests to reduce the ability of the catch case e: Exception
pattern to be re-introduced in future.
This is a tracking issue for all of the catch case e: Exception
instances that are almost all currently unuseful for a serious data processing library, so fixing the specific ExecutionException
instances is not enough to fix the issue.
thanks @ansell. I can see the issues with the exception handling above. This needs to be improved.
I can not see instances where we are catching or wrapping java.lang.Error
. If we have these, then we should remove them as a priority.
Wrapping of java.lang.Error
in the StackOverflowError
case that triggered this audit (and appeared again today) is done by a Java Thread.UncaughtExceptionHandler
, and/or the ExecutorService
. The wrapped exception class that we need to be looking at the cause
exception for in that case is a checked ExecutionException
. The wrapping isn't done in biocache-store
code so you won't find it just by searching for java.lang.Error
in this code. Searching for invokeAll
, Callable
and Executor
will show some of the code locations that are relevant in this case.
biocache-store has been replaced by pipelines.
There are a number of cases where the current Scala code uses the following pattern to swallowjava.lang.Error
, which includes bothjava.lang.OutOfMemoryError
andjava.lang.StackOverflowError
that biocache-store does not need to recover from.In some cases,
java.lang.Error
instances are wrapped inside of other Exception instances such asjava.util.concurrent.ExecutionException
, which are caught by the current Scala code which catches alljava.lang.Exception
instances, in most cases logging them and not rethrowing them.Not exiting quickly or recovering from these and other Error's have caused non-completion which requires manual JVM shutdowns that interrupt other tasks and may require full server restarts, as biocache-store is used as the basis for biocache-service that is designed for long running application lifetimes:
In some cases, these are wrapped with retry loops, which could complete in future by exiting the loop, but others are more risky and completion isn't guaranteed, or data is not loaded/processed/sampled/indexed.
The
catch case e: Exception
pattern is currently used 248 times in biocache-store (2.x). This issue will be used to track fixes for them where they are not already accommodated for, rather than creating many issues that may be each given low priority or status.Mocks.scala:29
-ALANameSearcher
. Does not fail tests when creation failsMigrateMedia.scala:50
-Config.mediaStore.save
. Does not retry media saving when failures occurIndexLocalRecords.scala:124
- Index record failures are not retriedSampling.scala:262
- Failures when reading points from file are not retriedSampling.scala:391
andSampling.scala:414
- Failures writing to a file are ignoredSampling.scala:628
andSampling.scala:640
- Failures when reading from a file or writing to theloc
table are ignoredSampling.scala:722
- WhenLocColumnExporter
fails, noone noticesSampling.scala:774
andSampling.scala:786
andSampling.scala:795
- WhenLoadSamplingConsumer
fails noone notices, breaking the Controller-Producer-Consumer pattern, and possibly causing a non-daemon thread to stay alive indefinitely, preventing JVM termination, partly tracked in #244ExpertDistributionOutlierTool.scala:534
- Failures performing database operations do not prevent future writing of partial resultsDuplicationDetection.scala:628
- Failures saving any duplicate abandons all other duplicates in the list and exits normally