Netflix / conductor

Conductor is a microservices orchestration engine.
Apache License 2.0
12.83k stars 2.34k forks source link

Conductor Server - not starting #351

Closed cheveyo20 closed 5 years ago

cheveyo20 commented 7 years ago

Hi, I have configured a Dynomite Cluster with 3 nodes, which seem to work (redis-cli to 8102 does work and keys are getting replicated), i use elasticsearch 2.4.6 with one node. The moment i start conductor, i get this:

Starting Conductor server
Property file: config.properties
config.properties
Using 'config.properties'
Using config file/app/config/config.properties
0    [main] INFO  com.netflix.dyno.jedis.DynoJedisClient  - Starting connection pool for app conductor
12   [pool-3-thread-1] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - Adding host connection pool for host: Host [hostname=xx.xx.xx.9, ipAddress=null, port=8102, rack: us-east-1c, datacenter: us-east-1, status: Up]
12   [pool-3-thread-1] INFO  com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl  - Priming connection pool for host:Host [hostname=xx.xx.xx.9, ipAddress=null, port=8102, rack: us-east-1c, datacenter: us-east-1, status: Up], with conns:10
13   [pool-3-thread-3] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - Adding host connection pool for host: Host [hostname=xx.xx.xx.7, ipAddress=null, port=8102, rack: us-east-1b, datacenter: us-east-1, status: Up]
12   [pool-3-thread-2] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - Adding host connection pool for host: Host [hostname=xx.xx.xx.6, ipAddress=null, port=8102, rack: us-east-1a, datacenter: us-east-1, status: Up]
14   [pool-3-thread-3] INFO  com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl  - Priming connection pool for host:Host [hostname=xx.xx.xx.7, ipAddress=null, port=8102, rack: us-east-1b, datacenter: us-east-1, status: Up], with conns:10
14   [pool-3-thread-2] INFO  com.netflix.dyno.connectionpool.impl.HostConnectionPoolImpl  - Priming connection pool for host:Host [hostname=xx.xx.xx.6, ipAddress=null, port=8102, rack: us-east-1a, datacenter: us-east-1, status: Up], with conns:10
100  [pool-3-thread-2] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - Successfully primed 10 of 10 to Host [hostname=xx.xx.xx.6, ipAddress=null, port=8102, rack: us-east-1a, datacenter: us-east-1, status: Up]
111  [pool-3-thread-3] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - Successfully primed 10 of 10 to Host [hostname=xx.xx.xx.7, ipAddress=null, port=8102, rack: us-east-1b, datacenter: us-east-1, status: Up]
113  [pool-3-thread-1] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - Successfully primed 10 of 10 to Host [hostname=xx.xx.xx.9, ipAddress=null, port=8102, rack: us-east-1c, datacenter: us-east-1, status: Up]
145  [main] INFO  com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl  - registered mbean com.netflix.dyno.connectionpool.impl:type=MonitorConsole
151  [main] INFO  com.netflix.conductor.server.ConductorServer  - Starting conductor server using dynomite/redis cluster dyn_o_mite

                     _            _             
  ___ ___  _ __   __| |_   _  ___| |_ ___  _ __ 
 / __/ _ \| '_ \ / _` | | | |/ __| __/ _ \| '__|
| (_| (_) | | | | (_| | |_| | (__| || (_) | |   
 \___\___/|_| |_|\__,_|\__,_|\___|\__\___/|_|   

401  [main] INFO  com.netflix.conductor.dao.dynomite.queue.DynoQueueDAO  - DynoQueueDAO initialized with prefix conductor_queues.test!
1262 [main] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task HTTP
1465 [main] INFO  com.netflix.conductor.contribs.http.HttpTask  - HttpTask initialized...
1466 [main] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task JSON_JQ_TRANSFORM
1825 [main] INFO  org.elasticsearch.plugins  - [Magma] modules [], plugins [], sites []
32551 [main] ERROR com.netflix.conductor.dao.index.ElasticSearchDAO  - None of the configured nodes are available: [{#transport#-1}{xx.xx.xx.5}{xx.xx.xx.5:9300}]
NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{xx.xx.xx.5}{xx.xx.xx.5:9300}]]
    at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:326)
    at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:223)
    at org.elasticsearch.client.transport.support.TransportProxyClient.execute(TransportProxyClient.java:55)
    at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:295)
    at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359)
    at org.elasticsearch.client.support.AbstractClient$IndicesAdmin.execute(AbstractClient.java:1226)
    at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:86)
    at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:56)
    at com.netflix.conductor.dao.index.ElasticSearchDAO.initIndex(ElasticSearchDAO.java:166)
    at com.netflix.conductor.dao.index.ElasticSearchDAO.<init>(ElasticSearchDAO.java:134)
    at com.netflix.conductor.dao.index.ElasticSearchDAO$$FastClassByGuice$$9fa9c95d.newInstance(<generated>)
    at com.google.inject.internal.DefaultConstructionProxyFactory$FastClassProxy.newInstance(DefaultConstructionProxyFactory.java:89)
    at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:111)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
    at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
    at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
    at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
    at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
    at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
    at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
    at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
    at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
    at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
    at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
    at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:110)
    at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:90)
    at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:268)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1092)
    at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
    at com.google.inject.internal.SingletonScope$1.get(SingletonScope.java:194)
    at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
    at com.google.inject.internal.InternalInjectorCreator$1.call(InternalInjectorCreator.java:205)
    at com.google.inject.internal.InternalInjectorCreator$1.call(InternalInjectorCreator.java:199)
    at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1085)
    at com.google.inject.internal.InternalInjectorCreator.loadEagerSingletons(InternalInjectorCreator.java:199)
    at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:180)
    at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:110)
    at com.google.inject.Guice.createInjector(Guice.java:99)
    at com.google.inject.Guice.createInjector(Guice.java:73)
    at com.google.inject.Guice.createInjector(Guice.java:62)
    at com.netflix.conductor.server.ConductorServer.start(ConductorServer.java:205)
    at com.netflix.conductor.server.Main.main(Main.java:54)
32602 [main] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - System Task Worker Initialized with 10 threads and a callback time of 30 second and queue size 100 with pollCount 10
32603 [main] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task SUB_WORKFLOW
32603 [main] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task WAIT
32606 [main] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task EVENT
32641 [Thread-2] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Started listening HTTP
32661 [main] INFO  org.eclipse.jetty.util.log  - Logging initialized @33611ms
32810 [main] INFO  org.eclipse.jetty.server.Server  - jetty-9.3.z-SNAPSHOT
33684 [pool-8-thread-1] INFO  com.netflix.dyno.queues.redis.RedisDynoQueue  - com.netflix.dyno.queues.redis.RedisDynoQueue is ready to serve HTTP
35952 [main] INFO  org.eclipse.jetty.server.handler.ContextHandler  - Started o.e.j.s.ServletContextHandler@52d6cd34{/,jar:file:/app/libs/conductor-server-1.8.2-SNAPSHOT-all.jar!/swagger-ui,AVAILABLE}
35958 [main] INFO  org.eclipse.jetty.server.AbstractConnector  - Started ServerConnector@f08fdce{HTTP/1.1,[http/1.1]}{0.0.0.0:8080}
35958 [main] INFO  org.eclipse.jetty.server.Server  - Started @36908ms
Started server on http://localhost:8080/
Creating kitchensink workflow
36514 [qtp248495761-42] ERROR com.netflix.conductor.server.resources.ApplicationExceptionMapper  - Workflow with kitchensink.1 already exists!
com.netflix.conductor.core.execution.ApplicationException: Workflow with kitchensink.1 already exists!
    at com.netflix.conductor.dao.dynomite.RedisMetadataDAO.create(RedisMetadataDAO.java:144)
    at com.netflix.conductor.service.MetadataService.registerWorkflowDef(MetadataService.java:156)
    at com.netflix.conductor.server.resources.MetadataResource.create(MetadataResource.java:64)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$VoidOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:167)
    at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
    at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
    at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
    at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286)
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276)
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181)
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
    at com.netflix.conductor.server.JerseyModule$1.doFilter(JerseyModule.java:99)
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1174)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1106)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:524)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:319)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:253)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    at java.lang.Thread.run(Thread.java:745)
36523 [main] ERROR com.netflix.conductor.server.ConductorServer  - POST http://localhost:8080/api/metadata/workflow returned a response status of 409 Conflict
com.sun.jersey.api.client.UniformInterfaceException: POST http://localhost:8080/api/metadata/workflow returned a response status of 409 Conflict
    at com.sun.jersey.api.client.WebResource.voidHandle(WebResource.java:709)
    at com.sun.jersey.api.client.WebResource.access$400(WebResource.java:74)
    at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:555)
    at com.netflix.conductor.server.ConductorServer.createKitchenSink(ConductorServer.java:261)
    at com.netflix.conductor.server.ConductorServer.start(ConductorServer.java:228)
    at com.netflix.conductor.server.Main.main(Main.java:54)

So it does seem to be an issue with elasticsearch... but elasticsearch seems to work!!

The moment i run the kitchensink worker i get this on worker side:

ERROR: {"code":"INTERNAL_ERROR","message":"INTERNAL_ERROR - java.lang.RuntimeException: com.netflix.dyno.connectionpool.exception.NoAvailableHostsException: NoAvailableHostsException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]Token not found for key hash: 3236000588","instance":"aca7ac376dc8"}
Error while polling 500 Server Error: Internal Server Error for url: http://domain:8080/api/tasks/poll/task_4?workerid=ID-NH-0100
# Database persistence model.  Possible values are memory, redis, and dynomite.
# If ommitted, the persistence used is memory
#
# memory : The data is stored in memory and lost when the server dies.  Useful for testing or demo
# redis : non-Dynomite based redis instance
# dynomite : Dynomite cluster.  Use this for HA configuration.

db=dynomite

# Dynomite Cluster details.
# format is host:port:rack separated by semicolon
workflow.dynomite.cluster.hosts=xx.xx.xx.6:8102:us-east-1a;xx.xx.xx.7:8102:us-east-1b;xx.xx.xx.9:8102:us-east-1c

# Dynomite cluster name
workflow.dynomite.cluster.name=dyn_o_mite

# Namespace for the keys stored in Dynomite/Redis
workflow.namespace.prefix=conductor

# Namespace prefix for the dyno queues
workflow.namespace.queue.prefix=conductor_queues

# No. of threads allocated to dyno-queues (optional)
queues.dynomite.threads=10

# Non-quorum port used to connect to local redis.  Used by dyno-queues.
# When using redis directly, set this to the same port as redis server
# For Dynomite, this is 22122 by default or the local redis-server port used by Dynomite.
queues.dynomite.nonQuorum.port=22122

# Transport address to elasticsearch
workflow.elasticsearch.url=xx.xx.xx.5:9300

# Name of the elasticsearch cluster
workflow.elasticsearch.index.name=conductor

# Additional modules (optional)
# conductor.additional.modules=class_extending_com.google.inject.AbstractModule

# Load sample kitchen sink workflow
loadSample=true

EC2_AVAILABILTY_ZONE=us-east-1a

I am not running on AWS, but i used naming convention..

hjha287 commented 7 years ago

By looking at the error it seems ES is not running on port 9300. Please check ES is running on what port and update the properties file accordingly. is it running on 9200?

cheveyo20 commented 7 years ago

I deployed it like in the example on a different host than conductor:

version: "3.3"

services:

  elasticsearch:
    image: elasticsearch:2.4
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - elk

  kibana:
    image: kibana:4.6
    ports:
      - "5601:5601"
    networks:
      - elk

networks:
  elk:
    driver: overlay
    driver_opts:
      encrypted: "true"

REST-Port 9200 is answering me with: http://xx.xx.xx.xx:9200/_cluster/health?pretty

{
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 1,
  "active_shards" : 1,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 1,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

Kibana is also working and connecting. Telnet on port 9300 does work :)

Do i need something like an host suplier/tokensupplier for dynomite in this case? Or why am i getting UNKNOWN

cheveyo20 commented 7 years ago

Ok you where right, i didnt know that elasticsearch does need the --net host option to be set. Now ES is working, but one error remains:

But in dynomite the keys were generated, so connection does work. Do i need something like an host suplier/tokensupplier for dynomite in this case?

6895 [pool-9-thread-1] ERROR com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - java.lang.RuntimeException: com.netflix.dyno.connectionpool.exception.NoAvailableHostsException: NoAvailableHostsException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]Token not found for key hash: 1534224792
java.lang.RuntimeException: java.lang.RuntimeException: com.netflix.dyno.connectionpool.exception.NoAvailableHostsException: NoAvailableHostsException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]Token not found for key hash: 1534224792
    at com.netflix.dyno.queues.redis.RedisDynoQueue.pop(RedisDynoQueue.java:224)
    at com.netflix.conductor.dao.dynomite.queue.DynoQueueDAO.pop(DynoQueueDAO.java:150)
    at com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator.pollAndExecute(SystemTaskWorkerCoordinator.java:145)
    at com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator.lambda$listen$1(SystemTaskWorkerCoordinator.java:125)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: com.netflix.dyno.connectionpool.exception.NoAvailableHostsException: NoAvailableHostsException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]Token not found for key hash: 1534224792
    at com.netflix.dyno.queues.redis.RedisDynoQueue.executeWithRetry(RedisDynoQueue.java:603)
    at com.netflix.dyno.queues.redis.RedisDynoQueue.execute(RedisDynoQueue.java:585)
    at com.netflix.dyno.queues.redis.RedisDynoQueue.peekIds(RedisDynoQueue.java:512)
    at com.netflix.dyno.queues.redis.RedisDynoQueue.prefetchIds(RedisDynoQueue.java:244)
    at com.netflix.dyno.queues.redis.RedisDynoQueue.pop(RedisDynoQueue.java:216)

more
Caused by: com.netflix.dyno.connectionpool.exception.NoAvailableHostsException: NoAvailableHostsException: [host=Host [hostname=UNKNOWN, ipAddress=UNKNOWN, port=0, rack: UNKNOWN, datacenter: UNKNOW, status: Down], latency=0(0), attempts=0]Token not found for key hash: 1534224792
    at com.netflix.dyno.connectionpool.impl.hash.BinarySearchTokenMapper.getToken(BinarySearchTokenMapper.java:68)
    at com.netflix.dyno.connectionpool.impl.lb.TokenAwareSelection.getTokenForKey(TokenAwareSelection.java:110)
    at com.netflix.dyno.connectionpool.impl.lb.TokenAwareSelection.getPoolForOperation(TokenAwareSelection.java:73)
    at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.getFallbackHostPool(HostSelectionWithFallback.java:210)
    at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.getConnection(HostSelectionWithFallback.java:154)
    at com.netflix.dyno.connectionpool.impl.lb.HostSelectionWithFallback.getConnectionUsingRetryPolicy(HostSelectionWithFallback.java:121)
    at com.netflix.dyno.connectionpool.impl.ConnectionPoolImpl.executeWithFailover(ConnectionPoolImpl.java:292)
    at com.netflix.dyno.jedis.DynoJedisClient.d_zrangeByScore(DynoJedisClient.java:2070)
    at com.netflix.dyno.jedis.DynoJedisClient.zrangeByScore(DynoJedisClient.java:2065)
    at com.netflix.dyno.queues.redis.RedisDynoQueue.lambda$peekIds$12(RedisDynoQueue.java:514)
    at com.netflix.dyno.queues.redis.RedisDynoQueue.executeWithRetry(RedisDynoQueue.java:592)

more
7290 [qtp1682619279-32] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task DECISION
7291 [qtp1682619279-32] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task FORK
7291 [qtp1682619279-32] INFO  com.netflix.conductor.core.execution.tasks.SystemTaskWorkerCoordinator  - Adding system task JOIN
7385 [qtp1682619279-32] INFO  com.netflix.dyno.queues.redis.RedisDynoQueue  - com.netflix.dyno.queues.redis.RedisDynoQueue is ready to serve task_1
7451 [qtp1682619279-32] INFO  com.netflix.dyno.queues.redis.RedisDynoQueue  - com.netflix.dyno.queues.redis.RedisDynoQueue is ready to serve _deciderQueue
7458 [main] INFO  com.netflix.conductor.server.ConductorServer  - Kitchen sink workflows are created!
hjha287 commented 7 years ago

I am not conductor SME's to comment on tokensupplier but I have used conductor with default tokensupplier implementation. I had encountered this error whenever the first dynomite node configured in the config file was down.

saidatta commented 6 years ago

I finally fixed it! I took the hint of changing it to network_mode: host. But, I had to add a condittion of where conductor-server dependencies should be successfully started before starting the condcutor-server

I did that using depends_on and condition: service_healthy.

kishorebanala commented 5 years ago

Closing this issue, as there is no activity for a while. Please feel free to open another issue if you still have questions.

avila3a commented 5 years ago

@saidatta

I finally fixed it! I took the hint of changing it to network_mode: host. But, I had to add a condittion of where conductor-server dependencies should be successfully started before starting the condcutor-server

I did that using depends_on and condition: service_healthy.

Hiii I have this same problem but I don't understand how to solve it by adding depends_on and condition: service_healthy to which file (driver.properties). or where the properties are added

saidatta commented 5 years ago

@avila3a depends_on & condition is for the docker-compose file. point the pre-reqs services to conductor-server service.

cchristopherdxc commented 4 years ago

C:\DXC\Software\conductor\server>../gradlew server '..' is not recognized as an internal or external command, operable program or batch file.

i installed gradle and tried to execute the above command to startup conductor server. but it fails i tried run the server from root directory as below C:\DXC\Software\conductor>gradlew server

Configure project : Inferred project: conductor, version: 2.31.0-SNAPSHOT Publication nebula not found in project :.

Task :conductor-common:compileJava Note: C:\DXC\Software\conductor\common\src\main\java\com\netflix\conductor\common\utils\JsonMapperProvider.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. Note: C:\DXC\Software\conductor\common\src\main\java\com\netflix\conductor\common\utils\ConstraintParamUtil.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details.

Task :conductor-common:jar FAILED

FAILURE: Build failed with an exception.

Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0. See https://docs.gradle.org/4.8.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 3s 3 actionable tasks: 3 executed