actionml / harness

Harness is a Machine Learning/AI Server with plugins for many algorithms including the Universal Recommender
Apache License 2.0
283 stars 49 forks source link

Training fails - EsHadoopRemoteException: illegal_argument_exception: if _id is specified it must not be empty #316

Closed julienmarie closed 2 years ago

julienmarie commented 2 years ago

Here is my configuration

{
    "engineId": "recom",
    "engineFactory": "com.actionml.engines.ur.UREngine",
    "sparkConf": {
        "spark.master": "local",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
        "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
        "spark.kryo.referenceTracking": "false",
        "spark.kryoserializer.buffer": "300m",
        "spark.executor.memory": "10g",
        "spark.driver.memory": "10g",
        "spark.es.index.auto.create": "true",
        "spark.es.nodes": "elasticsearch",
        "spark.es.nodes.wan.only": "true"
    },
    "algorithm":{
        "indicators": [
            {
                "name": "purchase"
            },
{
                "name": "cart"
            },{
                "name": "view"
            },{
                "name": "searchpref"
            }
        ]
    }
}

And the logs when I launch a training:


03:40:15.618 INFO  TaskSchedulerImpl - Removed TaskSet 235.0, whose tasks have all completed, from pool
03:40:15.619 INFO  TaskSchedulerImpl - Cancelling stage 235
03:40:15.619 INFO  DAGScheduler      - ResultStage 235 (runJob at EsSpark.scala:108) failed in 0.212 s due to Job aborted due to stage failure: Task 0 in stage 235.0 failed 1 times, most recent failure: Lost task 0.0 in stage 235.0 (TID 88, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/1000]. Error sample (first [5] error messages):
    org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: if _id is specified it must not be empty
    {"index":{"_id":""}}
{"available":1.0,"id":""}

Bailing out...
    at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:519)
    at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:127)
    at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:192)
    at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:172)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:74)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:108)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:108)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
03:40:15.619 INFO  DAGScheduler      - Job 48 failed: runJob at EsSpark.scala:108, took 0.213707 s
03:40:15.620 ERROR URAlgorithm       - Spark computation failed for engine recom with params {{"engineId":"recom","engineFactory":"com.actionml.engines.ur.UREngine","sparkConf":{"spark.master":"local","spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.kryo.registrator":"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator","spark.kryo.referenceTracking":"false","spark.kryoserializer.buffer":"300m","spark.executor.memory":"10g","spark.driver.memory":"10g","spark.es.index.auto.create":"true","spark.es.nodes":"elasticsearch","spark.es.nodes.wan.only":"true"},"algorithm":{"indicators":[{"name":"purchase"},{"name":"cart"},{"name":"view"},{"name":"searchpref"}]}}}```