03:40:15.618 INFO TaskSchedulerImpl - Removed TaskSet 235.0, whose tasks have all completed, from pool
03:40:15.619 INFO TaskSchedulerImpl - Cancelling stage 235
03:40:15.619 INFO DAGScheduler - ResultStage 235 (runJob at EsSpark.scala:108) failed in 0.212 s due to Job aborted due to stage failure: Task 0 in stage 235.0 failed 1 times, most recent failure: Lost task 0.0 in stage 235.0 (TID 88, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/1000]. Error sample (first [5] error messages):
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: if _id is specified it must not be empty
{"index":{"_id":""}}
{"available":1.0,"id":""}
Bailing out...
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:519)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:127)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:192)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:172)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:74)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:108)
at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:108)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
03:40:15.619 INFO DAGScheduler - Job 48 failed: runJob at EsSpark.scala:108, took 0.213707 s
03:40:15.620 ERROR URAlgorithm - Spark computation failed for engine recom with params {{"engineId":"recom","engineFactory":"com.actionml.engines.ur.UREngine","sparkConf":{"spark.master":"local","spark.serializer":"org.apache.spark.serializer.KryoSerializer","spark.kryo.registrator":"org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator","spark.kryo.referenceTracking":"false","spark.kryoserializer.buffer":"300m","spark.executor.memory":"10g","spark.driver.memory":"10g","spark.es.index.auto.create":"true","spark.es.nodes":"elasticsearch","spark.es.nodes.wan.only":"true"},"algorithm":{"indicators":[{"name":"purchase"},{"name":"cart"},{"name":"view"},{"name":"searchpref"}]}}}```
Here is my configuration
And the logs when I launch a training: