actionml / harness

Harness is a Machine Learning/AI Server with plugins for many algorithms including the Universal Recommender
Apache License 2.0
283 stars 49 forks source link

Harness doesn't create ES Index #224

Open fibenacci opened 4 years ago

fibenacci commented 4 years ago

When I want to create an Engine from my engine-template, Harness doesnt create an Index in Elasticsearch. Does anyone know why?

My sample engin.json looks like

{
    "engineId": "ecommerce",
    "engineFactory": "com.actionml.engines.ur.UREngine",
    "sparkConf": {
        "master": "local",
        "spark.driver-memory": "8g",
        "spark.executor-memory": "16g",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
        "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
        "spark.kryo.referenceTracking": "false",
        "spark.kryoserializer.buffer": "300m",
        "spark.es.index.auto.create": "true",
        "es.index.auto.create": "true"
    },
    "algorithm": {
        "indicators": [
            {
                "name": "buy"
            },
            {
                "name": "detail-view"
            },
            {
                "name": "search-terms"
            }
        ]
    }
}

Further in the logs the output shows that ES index name: is null

HarrisJT commented 4 years ago

I think it is because of this line https://github.com/actionml/harness/blob/255422c5de139e19596e9bc92935d2b7bcb0738c/rest-server/engines/src/main/scala/com/actionml/engines/ur/URAlgorithm.scala#L100

fibenacci commented 4 years ago

Doesn't this line simply create an index with the name of the engineId? But why it returns null?

qqmbr4k commented 4 years ago

Further in the logs the output shows that ES index name: is null

@h3llj0ck3y please show your logs

pferrel commented 4 years ago

@h3llj0ck3y @HarrisJT I believe this may be a problem with 0.5.1 but is fixed in 0.5.2-SNAPSHOT currently in the develop branch.

Can you check again to see if this is still a problem?

Charuru commented 4 years ago

@pferrel can the docker install be updated to 0.5.2?

blinder commented 4 years ago

I believe this is still an issue on 0.5.2-SNAPSHOT. I just got this error on doing a simple users query:

16:35:35.509 ERROR ActorSystemImpl   - Internal error
org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/test-test-org/_search], status line [HTTP/1.1 404 Not Found]
{"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [test-test-org]","resource.type":"index_or_alias","resource.id":"test-test-org","index_uuid":"_na_","index":"test-test-org"}],"type":"index_not_found_exception","reason":"no such index [test-test-org]","resource.type":"index_or_alias","resource.id":"test-test-org","index_uuid":"_na_","index":"test-test-org"},"status":404}
    at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
    at org.elasticsearch.client.RestClient.access$1700(RestClient.java:97)
    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:331)
    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327)
    at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
    at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
    at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
    at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
    at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
    at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
    at java.lang.Thread.run(Thread.java:748)
16:35:35.510 INFO  ActorSystemImpl   - Harness Server: Response for
  Request : HttpRequest(HttpMethod(POST),http://localhost:9090/engines/test-test-org/queries,List(Accept: application/json, Connection: keep-alive, Host: localhost:9090, User-Agent: AHC/1.0, Timeout-Access: <function1>),HttpEntity.Strict(application/json,{"user":"5de345932fc0b2eab8f722d3"}),HttpProtocol(HTTP/1.1))
  Response: Complete(HttpResponse(500 Internal Server Error,List(),HttpEntity.Strict(text/plain; charset=UTF-8,There was an internal server error.),HttpProtocol(HTTP/1.1)))
16:35:35.510 INFO  ActorSystemImpl   - Complete: HttpMethod(POST):http://localhost:9090/engines/test-test-org/queries -> 500 Internal Server Error [450 ms.]

The engine was created without any issue, and i was able to add event items, this happens on a query.

After doing some further testing, 0.5.1 and 0.5.2 both don't seem to perform any writes against ElasticSearch. Engines are created without error, and events are written, both (engines and events) show up in MongoDB just fine, but at no time is an index created in ES, nor after executing a training job anything is written to ES at all.

Which begs the question, how does this even work? Reading through: https://actionml.com/docs/h_workflow

it seems that at no time are ES operations are ever executed. Also making sure that: spark.es.nodes is set correctly has no apparent effect (I have ES installed on localhost on port 9200 and there are no errors reported on the console, but also no attempts to write there ever occurs)

pferrel commented 4 years ago

All operations involving writing to ES seem to be working for many other users. Is this still an issue?

BTW in older versions of the Harness UR, only training causes a write to ES. In the latest 0.6.0-SNAPSHOT in the develop branch a $set event will also write to ES. So this is the only case where an event causes any operation to ES.

blinder commented 4 years ago

@pferrel - i discovered the source of the issue i was having, and just neglected to update here, so here's what was the problem. engine creation, and event adding was fine, and when I would call the "train" endpoint, that was failing, and I simply failed to notice that step was failing, so the training failed, thus no index was ever created and the queries obviously would fail.

the reason training was failing was that i didn't have sample data for a few of the indicators. i feel like this should be allowed (no data for an indicator, so that training can complete) but i do understand it. i have since worked around this factor and am now seeing everything work correctly.

kgulpinar commented 3 years ago

set "spark.es.nodes": "elasticsearch" -> http://localhost:9200 or ip address.

piotr-sikora-v commented 3 years ago

I have same issue... someone fixed that ? latest deploy from https://github.com/actionml/harness-docker-compose my config:

{
  "engineId": "test1",
  "engineFactory": "com.actionml.engines.ur.UREngine",
  "sparkConf": {
    "master": "local",
    "spark.driver.memory": "1g",
    "spark.executor.memory": "1g",
    "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
    "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
    "spark.kryo.referenceTracking": "false",
    "spark.kryoserializer.buffer": "300m",
    "es.index.auto.create": "true",
    "spark.es.index.auto.create": "true",
    "es.nodes": "elasticsearch",
    "spark.es.nodes": "elasticsearch",
    "es.nodes.wan.only": "true",
    "spark.es.nodes.wan.only": "true"
  },
  "algorithm": {
    "indicators": [
      {
        "name": "purchase"
      },{
        "name": "view"
      },{
        "name": "category-pref"
      }
    ],
    "num": 4
  }
}

events works without problem, only queries have error:

harness          | 07:49:33.516 INFO  ActorSystemImpl   - Harness Server: HttpRequest(HttpMethod(POST),http://localhost:9091/engines/test1/queries,List(Host: localhost:9091, User-Agent: curl/7.68.0, Accept: */*, Timeout-Access: <function1>),HttpEntity.Strict(application/json,{"user": "John Doe"}),HttpProtocol(HTTP/1.1))
harness          | 07:49:33.523 INFO  URAlgorithm       - Engine-id: test1. Got query: 
harness          | URQuery(Some(John Doe),None,None,None,None,None,None,None,None,None,None,None,None,None,None)
harness          | 07:49:33.570 ERROR ActorSystemImpl   - Internal error
harness          | org.elasticsearch.client.ResponseException: method [POST], host [http://elasticsearch:9200], URI [/test1/_search], status line [HTTP/1.1 404 Not Found]
harness          | {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [test1]","resource.type":"index_or_alias","resource.id":"test1","index_uuid":"_na_","index":"test1"}],"type":"index_not_found_exception","reason":"no such index [test1]","resource.type":"index_or_alias","resource.id":"test1","index_uuid":"_na_","index":"test1"},"status":404}
harness          |  at org.elasticsearch.client.RestClient.convertResponse(RestClient.java:283)
harness          |  at org.elasticsearch.client.RestClient.access$1700(RestClient.java:97)
harness          |  at org.elasticsearch.client.RestClient$1.completed(RestClient.java:331)
harness          |  at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327)
harness          |  at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
harness          |  at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
harness          |  at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
harness          |  at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
harness          |  at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
harness          |  at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
harness          |  at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
harness          |  at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
harness          |  at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
harness          |  at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
harness          |  at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
harness          |  at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
harness          |  at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
harness          |  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
harness          |  at java.lang.Thread.run(Thread.java:748)
harness          | 07:49:33.573 INFO  ActorSystemImpl   - Harness Server: Response for
harness          |   Request : HttpRequest(HttpMethod(POST),http://localhost:9091/engines/test1/queries,List(Host: localhost:9091, User-Agent: curl/7.68.0, Accept: */*, Timeout-Access: <function1>),HttpEntity.Strict(application/json,{"user": "John Doe"}),HttpProtocol(HTTP/1.1))
harness          |   Response: Complete(HttpResponse(500 Internal Server Error,List(),HttpEntity.Strict(text/plain; charset=UTF-8,There was an internal server error.),HttpProtocol(HTTP/1.1)))
harness          | 07:49:33.577 INFO  ActorSystemImpl   - Complete: HttpMethod(POST):http://localhost:9091/engines/test1/queries -> 500 Internal Server Error [61 ms.]
piotr-sikora-v commented 3 years ago

OK... It seems like I must first add event with primary indicator, and then train engine... on train it create index and everything start works ;)

pnutmath commented 3 years ago

I am having similar issue while training the model. I am doing it locally with docker-compose way. docker compose file Engine Config:

{
    "engineId": "test01",
    "engineFactory": "com.actionml.engines.ur.UREngine",
    "sparkConf": {
        "master":"local",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer",
        "spark.kryo.registrator": "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator",
        "spark.kryo.referenceTracking": "false",
        "spark.kryoserializer.buffer": "300m",
        "spark.executor.memory": "3g",
        "spark.driver.memory": "3g",
        "spark.es.index.auto.create": "true",
        "spark.es.nodes": "elasticsearch",
        "spark.es.nodes.wan.only": "true"
    },
    "algorithm": {
        "indicators": [
            {
                "name": "like"
            },
            {
                "name": "view"
            }
        ]
    }
}

CURL:

curl --location --request POST 'http://localhost:9090/engines/test01/jobs'

Error:

java.io.IOException: elasticsearch
    at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:828)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
    at com.actionml.core.search.elasticsearch.ElasticSearchClient.createIndexByName(ElasticSearchSupport.scala:446)
    at com.actionml.core.search.elasticsearch.ElasticSearchClient.hotSwap(ElasticSearchSupport.scala:343)
    at com.actionml.engines.ur.URModel.save(URModel.scala:83)
    at com.actionml.engines.ur.URAlgorithm$$anonfun$train$1.apply(URAlgorithm.scala:295)
    at com.actionml.engines.ur.URAlgorithm$$anonfun$train$1.apply(URAlgorithm.scala:254)
    at scala.util.Success$$anonfun$map$1.apply(Try.scala:237)
    at scala.util.Try$.apply(Try.scala:192)
    at scala.util.Success.map(Try.scala:237)
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
    at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
    at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
    at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:91)
    at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
    at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:90)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.net.UnknownHostException: elasticsearch
    at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
    at java.net.InetAddress.getAllByName(InetAddress.java:1193)
    at java.net.InetAddress.getAllByName(InetAddress.java:1127)
    at org.apache.http.impl.conn.SystemDefaultDnsResolver.resolve(SystemDefaultDnsResolver.java:45)
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddressResolver.resolveRemoteAddress(PoolingNHttpClientConnectionManager.java:664)
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager$InternalAddressResolver.resolveRemoteAddress(PoolingNHttpClientConnectionManager.java:635)
    at org.apache.http.nio.pool.AbstractNIOConnPool.processPendingRequest(AbstractNIOConnPool.java:474)
    at org.apache.http.nio.pool.AbstractNIOConnPool.lease(AbstractNIOConnPool.java:280)
    at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.requestConnection(PoolingNHttpClientConnectionManager.java:295)
    at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.requestConnection(AbstractClientExchangeHandler.java:377)
    at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.start(DefaultClientExchangeHandlerImpl.java:129)
    at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:141)
    at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
    ... 24 common frames omitted