broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
988 stars 357 forks source link

JES backend does not seem to be properly configured in reference.conf #1748

Closed LeeTL1220 closed 7 years ago

LeeTL1220 commented 7 years ago
....snip...
[2016-12-06 01:52:49,82] [warn] Unrecognized configuration key(s) for Jes: genomics-api-queries-per-100-seconds, dockerhub.token, dockerhub.account, genomics.compute-service-account
....snip....

As far as I can tell, I am using the same keys as in the reference conf file. Worked in previous dev builds with same structure (though fewer keys).

@kcibul This is important, though I would not be surprised if this was user error.

From the configuration:

...snip...
    JES {
      actor-factory = "cromwell.backend.impl.jes.JesBackendLifecycleActorFactory"
      config {
        # Google project
        project = "broad-dsde-methods"

        # Base bucket for workflow executions
        root = "gs://broad-dsde-methods/cromwell-executions-eval-gatk-protected/"

        # Set this to the lower of the two values "Queries per 100 seconds" and "Queries per 100 seconds per user" for
        # your project.
        #
        # Used to help determine maximum throughput to the Google Genomics API. Setting this value too low will
        # cause a drop in performance. Setting this value too high will cause QPS based locks from Google.
        # 1000 is the default "Queries per 100 seconds per user", 50000 is the default "Queries per 100 seconds"
        # See https://cloud.google.com/genomics/quotas for more information
        genomics-api-queries-per-100-seconds = 1000

        # Polling for completion backs-off gradually for slower-running jobs.
        # This is the maximum polling interval (in seconds):
        maximum-polling-interval = 600

        # Optional Dockerhub Credentials. Can be used to access private docker images.  REMOVED HERE
        dockerhub {
           account = "user_manually_removed"
           token = "password_manually_removed"
        }

        genomics {
          # A reference to an auth defined in the `google` stanza at the top.  This auth is used to create
          # Pipelines and manipulate auth JSONs.
          auth = "application-default"

          // alternative service account to use on the launched compute instance
          // NOTE: If combined with service account authorization, both that serivce account and this service account
          // must be able to read and write to the 'root' GCS path
          compute-service-account = "default"

          # Endpoint for APIs, no reason to change this unless directed by Google.
          endpoint-url = "https://genomics.googleapis.com/"
        }

        filesystems {
          gcs {
            # A reference to a potentially different auth for manipulating files via engine functions.
            auth = "application-default"
          }
        }
      }
    }

    #AWS {
...snip...
LeeTL1220 commented 7 years ago

Workflow does not run and cromwell hangs.

geoffjentry commented 7 years ago

This makes me realize that I didn't add genomics-api-queries-per-100-seconds to the whitelist but that seems unlikely to be related here

geoffjentry commented 7 years ago

Aside: I'd recommend not using a modified copy of the reference.conf file but maintaining just the diffs for your personal conf. I find it's a lot easier to keep track of things that way.

kcibul commented 7 years ago

@Horneth -- can you look into this as a part of bug/support rotation?

geoffjentry commented 7 years ago

Actually it looks like I wasn't the only one who missed updating the whitelist and that's the warning @LeeTL1220 is seeing. I still doubt it has to do with the hanging he's seeing. My bet is that all of those settings are being properly assigned (I know that's the case w/ the QPS), and it appears that way by looking at the code.

LeeTL1220 commented 7 years ago

And here is the exception message:

[ERROR] [12/06/2016 13:58:30.046] [cromwell-system-akka.actor.default-dispatcher-3] [akka://cromwell-system/user/SingleWorkflowRunnerActor] Unable to create actor for ActorRef Actor[akka://
cromwell-system/user/SingleWorkflowRunnerActor/ServiceRegistryActor/KeyValue#988818050]
java.lang.RuntimeException: Unable to create actor for ActorRef Actor[akka://cromwell-system/user/SingleWorkflowRunnerActor/ServiceRegistryActor/KeyValue#988818050]
        at cromwell.server.CromwellRootActor$$anonfun$1.applyOrElse(CromwellRootActor.scala:81)
        at cromwell.server.CromwellRootActor$$anonfun$1.applyOrElse(CromwellRootActor.scala:80)
        at akka.actor.SupervisorStrategy.handleFailure(FaultHandling.scala:295)
        at akka.actor.dungeon.FaultHandling$class.handleFailure(FaultHandling.scala:263)
        at akka.actor.ActorCell.handleFailure(ActorCell.scala:374)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:459)
        at akka.actor.ActorCell.systemInvoke(ActorCell.scala:483)
        at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:282)
        at akka.dispatch.Mailbox.run(Mailbox.scala:223)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at akka.util.Reflect$.instantiate(Reflect.scala:65)
        at akka.actor.ArgsReflectConstructor.produce(IndirectActorProducer.scala:96)
        at akka.actor.Props.newActor(Props.scala:213)
        at akka.actor.ActorCell.newActor(ActorCell.scala:562)
        at akka.actor.ActorCell.create(ActorCell.scala:588)
        at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:461)
        ... 8 more
Caused by: java.lang.ExceptionInInitializerError
        at cromwell.services.SingletonServicesStore$class.$init$(ServicesStore.scala:28)
        at cromwell.services.keyvalue.impl.SqlKeyValueServiceActor.<init>(SqlKeyValueServiceActor.scala:16)
        ... 18 more
Caused by: com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'main'
        at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
        at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:145)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:172)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
        at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:258)
        at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:264)
        at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37)
        at cromwell.services.SingletonServicesStore$.<init>(ServicesStore.scala:43)
        at cromwell.services.SingletonServicesStore$.<clinit>(ServicesStore.scala)
        ... 20 more
LeeTL1220 commented 7 years ago

Fixed it. I had an issue with my database clause.