elki-project / elki

ELKI Data Mining Toolkit
https://elki-project.github.io/
GNU Affero General Public License v3.0
780 stars 321 forks source link

multiple calling of LOF from Scala leads to AbortException: DBID range allocation error #40

Closed edouardfouche closed 6 years ago

edouardfouche commented 6 years ago

Hello, I am using the LOF implementation from Elki for some experiments. Since I work in scala, I have written a wrapper to call it from the jar (version 0.7.2). The wrapper is the following:

import de.lmu.ifi.dbs.elki.algorithm.outlier.lof.LOF
import de.lmu.ifi.dbs.elki.database.StaticArrayDatabase
import de.lmu.ifi.dbs.elki.datasource.ArrayAdapterDatabaseConnection
import de.lmu.ifi.dbs.elki.distance.distancefunction.minkowski

import scala.collection.mutable.ListBuffer

/**
  * Created by fouchee on 04.09.17.
  */
case class ElkiLOF(k: Int) extends OutlierDetector {
  def computeScores(instances: Array[Array[Double]]): Array[(Int, Double)] = {
    val distance = new minkowski.EuclideanDistanceFunction
    val lof = new LOF(k, distance)

    val dbc = new ArrayAdapterDatabaseConnection(instances) // Adapter to load data from an existing array.
    val db = new StaticArrayDatabase(dbc, null) // Create a database (which may contain multiple relations!)
    db.initialize()
    val result = lof.run(db).getScores()

    var scoreList = new ListBuffer[Double]()
    val DBIDs = result.iterDBIDs()
    while ( {
      DBIDs.valid
    }) {
      scoreList += result.doubleValue(DBIDs)
      DBIDs.advance
    }
    scoreList

    val corrected = scoreList.map {
      case d if d.isNaN => 1.0 // Or whatever value you'd prefer.
      case d if d.isNegInfinity => 1.0 // Or whatever value you'd prefer.
      case d if d.isPosInfinity => 1.0 // Or whatever value you'd prefer.
      case d => d
    }
    corrected.toArray.zipWithIndex.map(x => (x._2, x._1))
  }
}

It works well, and the implementation is really fast I must say. However, if I run it multiple times in parallel (and I mean with a lot of different data sets and repetition) at some point I run into the following error:

[error] Exception in thread "main" de.lmu.ifi.dbs.elki.utilities.exceptions.AbortException: DBID range allocation error - too many objects allocated!
[error]     at de.lmu.ifi.dbs.elki.database.ids.integer.TrivialDBIDFactory.generateStaticDBIDRange(TrivialDBIDFactory.java:72)
[error]     at de.lmu.ifi.dbs.elki.database.ids.DBIDUtil.generateStaticDBIDRange(DBIDUtil.java:196)
[error]     at de.lmu.ifi.dbs.elki.database.StaticArrayDatabase.initialize(StaticArrayDatabase.java:129)
[error]     at com.edouardfouche.detectors.ElkiLOF$.computeScores(ElkiLOF.scala:29)

I don't know how to correct this error. When I run in sequential, it occurs at some point as well. It seems that the underlying TrivialDBIDFactory does not deallocate the DBIDs that are not in use anymore. I found here the piece of code that launch the error http://www.massapi.com/class/de/lmu/ifi/dbs/elki/utilities/exceptions/AbortException-5.html

Any idea how to avoid that?

Thank you, Edouard

kno10 commented 6 years ago

It runs out of unique object ids. In total, you supposedly ran it on more than 2^31 objects. The current class will assign every object ID only once. This serves as a safety guard to prevent mixing up IDs from different data sets.

If you use the ArrayAdapterDatabaseConnection constructor with a startid=0, then this will not happen.

ELKI will be even faster if you add an index, for example a cover tree (on large enough data). An example (using the newer ELKIBuilder API, and a R*-tree index) can be found here: https://github.com/elki-project/elki/blob/master/addons/tutorial/src/main/java/tutorial/javaapi/GeoIndexing.java

There is EuclideanDistanceFunction.STATIC, as this distance does not have parameters.

From a LOF logic, infinite LOF is an outlier. This occurs when you have minpts duplicates (at distance 0, and thus 'density' infinity), and a nearby point.

Closing, because it is behaving as intended, and there is a constructor of ArrayAdapterDatabaseConnection to solve your problem.

Grüße aus Heidelberg.

edouardfouche commented 6 years ago

Thank you very much !