ENCODE-DCC / caper

Cromwell/WDL wrapper for Python
MIT License
54 stars 18 forks source link

Can caper server use a custom Cromwell conf file? #136

Closed yihming closed 3 years ago

yihming commented 3 years ago

To Whom It May Concern,

I'm trying to run caper server with a Cromwell conf file I wrote by myself with the following command on a GCP instance:

caper server --port 8080 --backend-file gcp.conf

However, it gives the following error:

Traceback (most recent call last):
  File "/home/yyang/caper/bin/caper", line 13, in <module>
    main()
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 675, in main
    return runner(parsed_args, nonblocking_server=nonblocking_server)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 229, in runner
    return subcmd_server(c, args, nonblocking=nonblocking_server)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 331, in subcmd_server
    thread = caper_runner.server(fileobj_stdout=f, **args_from_cli)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/caper_runner.py", line 519, in server
    custom_backend_conf=custom_backend_conf,
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/caper_backend_conf.py", line 362, in create_file
    hocon_s.merge(s, update=True)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/hocon_string.py", line 165, in merge
    d = HOCONString(b).to_dict()
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/hocon_string.py", line 149, in to_dict
    return json.loads(j)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 29 column 36 (char 820)

Is it because my specification in gcp.conf is incorrect? But I was able to directly run Cromwell with it in the same instance:

java -Dconfig.file=gcp.conf -jar cromwell-59.jar run star-solo.wdl -i starsolo_inputs.json

Could you please give me a hand on how to properly fit it into Caper? I've attached my gcp.conf right below. Thanks!

include required(classpath("application"))

google {
    application-name = "cromwell"
    auths = [
        {
            name = "application-default"
            scheme = "application_default"
        }
    ]
}

engine {
    filesystems {
        gcs {
            auth = "application-default"
            project = "mgh-lilab-archive"
        }
    }
}

backend {
    default = PAPIv2

    providers {
        PAPIv2 {
            actor-factory = "cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory"
            config {
                # Google project
                project = "mgh-lilab-archive"

                # Base bucket for workflow executions
                root = "gs://mgh-lilab-fileshare/cromwell_execution"

                # Make the name of the backend used for call caching purposes insensitive to the PAPI version.
                name-for-call-caching-purposes: PAPI

                # Emit a warning if jobs last longer than this amount of time. This might indicate that something got stuck in PAPI.
                slow-job-warning-time: 24 hours

                # Set this to the lower of the two values "Queries per 100 seconds" and "Queries per 100 seconds per user" for
                # your project.
                #
                # Used to help determine maximum throughput to the Google Genomics API. Setting this value too low will
                # cause a drop in performance. Setting this value too high will cause QPS based locks from Google.
                # 1000 is the default "Queries per 100 seconds per user", 50000 is the default "Queries per 100 seconds"
                # See https://cloud.google.com/genomics/quotas for more information
                genomics-api-queries-per-100-seconds = 1000

                # Polling for completion backs-off gradually for slower-running jobs.
                # This is the maximum polling interval (in seconds):
                maximum-polling-interval = 600

                # Number of workers to assign to PAPI requests
                request-workers = 3

                genomics {
                    # A reference to an auth defined in the `google` stanza at the top.
                    # This auth is used to create pipelines and manipulate auth JSONs.
                    auth = "application-default"

                    # Endpoint for APIs, no reason to change this unless directed by Google.
                    endpoint-url = "https://lifesciences.googleapis.com/"

                    # Currently Cloud Life Sciences API is available only in `us-central1` and `europe-west2` locations.
                    location = "us-central1"

                    # Restrict access to VM metadata. Useful in cases when untrusted containers are running under a service
                    # account not owned by the submitting user
                    restrict-metadata-access = false

                    # Pipelines v2 only: specify the number of times localization and delocalization operations should be attempted
                    # There is no logic to determine if the error was transient or not, everything is retried upon failure
                    # Defaults to 3
                    localization-attempts = 3

                    # Specifies the minimum file size for `gsutil cp` to use parallel composite uploads during delocalization.
                    # Parallel composite uploads can result in a significant improvement in delocalization speed for large files
                    # but may introduce complexities in downloading such files from GCS, please see
                    # https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads for more information.
                    #
                    # If set to 0 parallel composite uploads are turned off. The default Cromwell configuration turns off
                    # parallel composite uploads, this sample configuration turns it on for files of 150M or larger.
                    parallel-composite-upload-threshold="150M"
                }

                filesystems {
                    gcs {
                        # A reference to a potentially different auth for manipulating files via engine functions.
                        auth = "application-default"

                        # Google project which will be billed for the requests
                        project = "mgh-lilab-archive"

                        caching {
                            # When a cache hit is found, the following duplication strategy will be followed to use the cached outputs
                            # Possible values: "copy", "reference". Defaults to "copy"
                            # "copy": Copy the output files
                            # "reference": DO NOT copy the output files but point to the original output files instead.
                            #              Will still make sure than all the original output files exist and are accessible before
                            #              going forward with the cache hit.
                            duplication-strategy = "copy"
                        }
                    }
                }

                default-runtime-attributes {
                    cpu: 1
                    failOnStderr: false
                    continueOnReturnCode: 0
                    memory: "2048 MB"
                    bootDiskSizeGb: 10
                    # Allowed to be a String, or a list of Strings
                    disks: "local-disk 10 SSD"
                    noAddress: false
                    preemptible: 0
                    zones: ["us-central1-a", "us-central1-b"]
                }

                include "papi_v2_reference_image_manifest.conf"
            }
        }
    }
}
leepc12 commented 3 years ago

What is your caper version (check with caper -v)?

I tried to replicate the error with the latest Caper 1.6.3 but got different one. It passed the JSON parsing error.

2021-07-06 15:03:51,465  INFO  - Reference disks feature for PAPIv2 backend is not configured.
2021-07-06 15:03:51,487  WARN  - Unrecognized configuration key(s) for gcp: localization-attempts
2021-07-06 15:03:51,488  INFO  - Reference disks feature for gcp backend is not configured.
2021-07-06 15:03:51,507  ERROR - Failed to instantiate Cromwell System. Shutting down Cromwell.
common.exception.AggregatedException: :
null
        Google Pipelines API configuration is not valid: Errors:
`google` configuration stanza does not contain an auth named 'service-account'.  Known auth names: application-default
`google` configuration stanza does not contain an auth named 'service-account'.  Known auth names: application-default
        at common.util.TryUtil$.sequenceIterable(TryUtil.scala:29)
        at common.util.TryUtil$.sequenceMap(TryUtil.scala:47)
        at cromwell.engine.backend.CromwellBackends.<init>(CromwellBackends.scala:14)
        at cromwell.engine.backend.CromwellBackends$.initBackends(CromwellBackends.scala:42)
        at cromwell.server.CromwellSystem.$init$(CromwellSystem.scala:68)
yihming commented 3 years ago

Hi @leepc12 ,

I was able to figure out where the error is from. When Caper parses my conf file, for the following line

slow-job-warning-time: 24 hours

it rewrites to

"slow-job-warning-time": relativedelta(days=+1)

in HOCONConverter.to_json(c), which is Line 147 of hocon_string.py. Therefore, this causes the error.

After modifying by adding quote:

slow-job-warning-time: "24 hours"

the error goes away.

Not sure if this needs to be fixed, because Cromwell could process this syntax, while it's always good to add quote to string values with whitespaces.