Netflix / conductor

Conductor is a microservices orchestration engine.
Apache License 2.0
12.82k stars 2.34k forks source link

ArrayIndexOutOfBoundsException while running with dynomite #59

Closed ajayaks closed 7 years ago

ajayaks commented 7 years ago

Hi,

I am trying to start conductor with with dynomite. Dynomite and redis are running on local machine. I have done the below settings.

For conductor service

db=dynomite

Dynomite Cluster details.

format is host:port:rack separated by semicolon

workflow.dynomite.cluster.hosts=localhost:8102:localrack

Dynomite cluster name

workflow.dynomite.cluster.name=test_coductor

namespace for the keys stored in Dynomite/Redis

workflow.namespace.prefix=testnamespace

namespace prefix for the dyno queues

workflow.namespace.queue.prefix=ecms_queues

no. of threads allocated to dyno-queues

queues.dynomite.threads=10

non-quorum port used to connect to local redis. Used by dyno-queues

queues.dynomite.nonQuorum.port=6369

Transport address to elasticsearch

workflow.elasticsearch.url=localhost:9300

Name of the elasticsearch cluster

workflow.elasticsearch.index.name=test_index

While starting the conductor from jar getting below error:-

Its reading the properties file and having above properties.

0 [main] INFO com.netflix.dyno.jedis.DynoJedisClient - Starting connection pool for app conductor 4 [pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Adding host connection pool for host: Host [hostname=localhost, ipAddress=null, port=8102, rack: localrack, datacenter: localrac, status: Up] 4 [pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl - Priming connection pool for host:Host [hostname=localhost, ipAddress=null, port=8102, rack: localrack, datacenter: localrac, status: Up], with conns:3 41 [pool-3-thread-1] INFO com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl - Successfully primed 3 of 3 to Host [hostname=localhost, ipAddress=null, port=8102, rack: localrack, datacenter: localrac, status: Up] Exception in thread "main" java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 at com.netflix.dyno.jedis.DynoJedisClient$Builder.startConnectionPool(DynoJedisClient.java:3409) at com.netflix.dyno.jedis.DynoJedisClient$Builder.createConnectionPool(DynoJedisClient.java:3380) at com.netflix.dyno.jedis.DynoJedisClient$Builder.buildDynoJedisClient(DynoJedisClient.java:3358) at com.netflix.dyno.jedis.DynoJedisClient$Builder.build(DynoJedisClient.java:3292) at com.netflix.conductor.server.ConductorServer.init(ConductorServer.java:159) at com.netflix.conductor.server.ConductorServer.(ConductorServer.java:114) at com.netflix.conductor.server.Main.main(Main.java:78) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.calculateReplicationFactor(HostSelectionWithFallback.java:389) at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.initWithHosts(HostSelectionWithFallback.java:346) at com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl.initSelectionStrategy(ConnectionPoolImpl.java:627) at com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl.start(ConnectionPoolImpl.java:526) at com.netflix.dyno.jedis.DynoJedisClient$Builder.startConnectionPool(DynoJedisClient.java:3392) ... 6 more

Please suggest. Regards Ajay

ajayaks commented 7 years ago

It would be great if you can suggest how to fix this as i am totally stick over this. Please suggest.

v1r3n commented 7 years ago

Can you update the conductor properties for the dynomite to use the below? (note the change in the rack name)

workflow.dynomite.cluster.hosts=localhost:8102:us-east-1b

That should solve. Let me know if that works.

ajayaks commented 7 years ago

Thanks Viren its working with above settings but now ElasticSearch is not working as its not calling EmbeddedElasticSearch. I have 2 questions here as below:-

  1. Why us-east-1b is mandatory only not other value? What exactly should be the value?
  2. As if db=Dynomite then its not calling EmbeddedElasticSearch. I have started local instance of elasticsearch and its running on 9200. What are the changes required if use my local instance of ElasticSearch? I have already defined below and created test_index in local ES instance.

Transport address to elasticsearch workflow.elasticsearch.url=localhost:9300

Name of the elasticsearch cluster

workflow.elasticsearch.index.name=test_index

Basically its not using ES running locally on my machine, please suggest.

v1r3n commented 7 years ago
  1. This is a probably a bug that is looking for a valid rack name typically used by AWS or Google cloud. you can use {us,eu,uk}-{east,west}-{1..9}{a..z} as a format and it should work.
  2. If you are not running in memory, then you will need to supply both elasticsearch and dynomite addresses that must be brought up separately. Refer to the server documentation for the details.

Do you have stack trace on the elasticsearch connectivity if there are exceptions? If not, what gives that it is not using the local ES?

ajayaks commented 7 years ago

There is no message in startup logs saying its using elastic search. I have defined new tasks and workflows and its storing into dynomite as its visible on server restart.

When i am checking the index data , docs count is 0. http://localhost:9200/_cat/indices?v

health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open test_index M0g61j6eTFm_xiwxUKxYcA 3 2 0 0 390b 390b

Even its not doing indexing for kitchensink, its available in dynomite but again ES index is empty. Other thing would you please what is the key in redis when we are saving the tasks or workflows in dynomite?

v1r3n commented 7 years ago

can you give docker compose a try? There is a docker compose file which brings up ES, Dynomite, Conductor and UI and has been tested to work.

For the redis key - take a look at the https://github.com/Netflix/conductor/blob/dev/redis-persistence/src/main/java/com/netflix/conductor/dao/dynomite/RedisExecutionDAO.java. The save methods will give a good idea on which redis keys are used.

ajayaks commented 7 years ago

Ok, Is there any mapping file you having for elasticsearch for storing conductor data? I can see template in EmbeddedES. Is there any mapping file you have for which we have to use for local instance of ES?

v1r3n commented 7 years ago

do you mean the elasticearch index mapping file?

ajayaks commented 7 years ago

I have defined template and mappings for task and workflow. Now ES is running with index and mappings already defined. Now i am getting below errors, ES is running on 9200 and index is test_index. I have below settings in config file.

Transport address to elasticsearch

workflow.elasticsearch.url=localhost:9300

Name of the elasticsearch cluster

workflow.elasticsearch.index.name=ecms_index

## On conductor side:-

1019661 [elasticsearch[Glitch][generic][T#1]] INFO org.elasticsearch.client.transport - [Glitch] failed to get local cluster state for {#transport#-1}{127.0.0.1}{127.0.0.1:9200}, disconnecting... ReceiveTimeoutTransportException[[][127.0.0.1:9200][cluster:monitor/state] request_id [11] timed out after [5005ms]] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:698) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 1025014 [AsyncResolver-bootstrap-executor-0] INFO com.netflix.discovery.shared.resolver.aws.ConfigClusterResolver - Resolving eureka endpoints via configuration 1029674 [elasticsearch[Glitch][generic][T#1]] INFO org.elasticsearch.client.transport - [Glitch] failed to get local cluster state for {#transport#-1}{127.0.0.1}{127.0.0.1:9200}, disconnecting... ReceiveTimeoutTransportException[[][127.0.0.1:9200][cluster:monitor/state] request_id [12] timed out after [5006ms]] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:698) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

## On elasticsearch side:- [2017-02-07T15:46:40,580][WARN ][o.e.t.n.Netty4Transport ] [8ntXxYI] exception caught on transport layer [[id: 0x40f2afaa, L:/127.0.0.1:9300 - R:/127.0.0.1:52392]], closing connection java.lang.IllegalStateException: Received message from unsupported version: [2.0.0] minimal compatible version is: [5.0.0] at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1323) ~[elasticsearch-5.2.0.jar:5.2.0]

It seems version issue between ES and client on conductor. I updated ES client to 5+ then i got below build issues as some methods have been removed.

/Users/ajay/OfficeWork/Office_Code/ECMS_Conductor/conductor/redis-persistence/src/main/java/com/netflix/conductor/dao/index/ElasticSearchDAO.java:161: error: cannot find symbol if (!response.isFound()) { ^ symbol: method isFound() location: variable response of type DeleteResponse /Users/ajay/OfficeWork/Office_Code/ECMS_Conductor/conductor/redis-persistence/src/main/java/com/netflix/conductor/dao/index/ElasticSearchDAO.java:196: error: cannot find symbol final SearchRequestBuilder srb = client.prepareSearch(indexName).setQuery(fq).setTypes(WORKFLOW_DOC_TYPE).setNoFields().setFrom(start).setSize(size); ^ symbol: method setNoFields() location: class SearchRequestBuilder /Users/ajay/OfficeWork/Office_Code/ECMS_Conductor/conductor/redis-persistence/src/main/java/com/netflix/conductor/dao/index/ElasticsearchModule.java:54: error: cannot find symbol Settings.Builder settings = Settings.settingsBuilder(); ^ symbol: method settingsBuilder() location: class Settings /Users/ajay/OfficeWork/Office_Code/ECMS_Conductor/conductor/redis-persistence/src/main/java/com/netflix/conductor/dao/index/ElasticsearchModule.java:58: error: cannot find symbol TransportClient tc = TransportClient.builder().settings(settings).build(); ^ symbol: method builder() location: class TransportClient 4 errors :conductor-redis-persistence:compileJava FAILED

FAILURE: Build failed with an exception.

ajayaks commented 7 years ago

I changed ES server version to 2.4.4 and setup template with 1 shard . After that all these settings and changes i can see the data in ES index and UI is also working.

Server UI is using ES index for showing the ALL, Running, Failed workflow info. please confirm.

Thanks for your help.

v1r3n commented 7 years ago

@ajayaks closing this issue. Please re-open or create another issue if there are other problems.

blueelephants commented 7 years ago

@v1r3n I also experienced the same problem and the proposed workaround from you to change the rack name worked for me workflow.dynomite.cluster.hosts=localhost:8102:us-east-1b

Interestingly, only this us-east-1b setting works. I tried to change it to e.g. eu-west-1a and that didn't work. Is this us-east-1b setting somehow hardcoded?