Can caper server use a custom Cromwell conf file?

yihming commented 3 years ago

To Whom It May Concern,

I'm trying to run caper server with a Cromwell conf file I wrote by myself with the following command on a GCP instance:

caper server --port 8080 --backend-file gcp.conf

However, it gives the following error:

Traceback (most recent call last):
  File "/home/yyang/caper/bin/caper", line 13, in <module>
    main()
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 675, in main
    return runner(parsed_args, nonblocking_server=nonblocking_server)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 229, in runner
    return subcmd_server(c, args, nonblocking=nonblocking_server)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/cli.py", line 331, in subcmd_server
    thread = caper_runner.server(fileobj_stdout=f, **args_from_cli)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/caper_runner.py", line 519, in server
    custom_backend_conf=custom_backend_conf,
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/caper_backend_conf.py", line 362, in create_file
    hocon_s.merge(s, update=True)
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/hocon_string.py", line 165, in merge
    d = HOCONString(b).to_dict()
  File "/home/yyang/.local/lib/python3.7/site-packages/caper/hocon_string.py", line 149, in to_dict
    return json.loads(j)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 29 column 36 (char 820)

Is it because my specification in gcp.conf is incorrect? But I was able to directly run Cromwell with it in the same instance:

java -Dconfig.file=gcp.conf -jar cromwell-59.jar run star-solo.wdl -i starsolo_inputs.json

Could you please give me a hand on how to properly fit it into Caper? I've attached my gcp.conf right below. Thanks!

include required(classpath("application"))

google {
    application-name = "cromwell"
    auths = [
        {
            name = "application-default"
            scheme = "application_default"
        }
    ]
}

engine {
    filesystems {
        gcs {
            auth = "application-default"
            project = "mgh-lilab-archive"
        }
    }
}

backend {
    default = PAPIv2

    providers {
        PAPIv2 {
            actor-factory = "cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory"
            config {
                # Google project
                project = "mgh-lilab-archive"

                # Base bucket for workflow executions
                root = "gs://mgh-lilab-fileshare/cromwell_execution"

                # Make the name of the backend used for call caching purposes insensitive to the PAPI version.
                name-for-call-caching-purposes: PAPI

                # Emit a warning if jobs last longer than this amount of time. This might indicate that something got stuck in PAPI.
                slow-job-warning-time: 24 hours

                # Set this to the lower of the two values "Queries per 100 seconds" and "Queries per 100 seconds per user" for
                # your project.
                #
                # Used to help determine maximum throughput to the Google Genomics API. Setting this value too low will
                # cause a drop in performance. Setting this value too high will cause QPS based locks from Google.
                # 1000 is the default "Queries per 100 seconds per user", 50000 is the default "Queries per 100 seconds"
                # See https://cloud.google.com/genomics/quotas for more information
                genomics-api-queries-per-100-seconds = 1000

                # Polling for completion backs-off gradually for slower-running jobs.
                # This is the maximum polling interval (in seconds):
                maximum-polling-interval = 600

                # Number of workers to assign to PAPI requests
                request-workers = 3

                genomics {
                    # A reference to an auth defined in the `google` stanza at the top.
                    # This auth is used to create pipelines and manipulate auth JSONs.
                    auth = "application-default"

                    # Endpoint for APIs, no reason to change this unless directed by Google.
                    endpoint-url = "https://lifesciences.googleapis.com/"

                    # Currently Cloud Life Sciences API is available only in `us-central1` and `europe-west2` locations.
                    location = "us-central1"

                    # Restrict access to VM metadata. Useful in cases when untrusted containers are running under a service
                    # account not owned by the submitting user
                    restrict-metadata-access = false

                    # Pipelines v2 only: specify the number of times localization and delocalization operations should be attempted
                    # There is no logic to determine if the error was transient or not, everything is retried upon failure
                    # Defaults to 3
                    localization-attempts = 3

                    # Specifies the minimum file size for `gsutil cp` to use parallel composite uploads during delocalization.
                    # Parallel composite uploads can result in a significant improvement in delocalization speed for large files
                    # but may introduce complexities in downloading such files from GCS, please see
                    # https://cloud.google.com/storage/docs/gsutil/commands/cp#parallel-composite-uploads for more information.
                    #
                    # If set to 0 parallel composite uploads are turned off. The default Cromwell configuration turns off
                    # parallel composite uploads, this sample configuration turns it on for files of 150M or larger.
                    parallel-composite-upload-threshold="150M"
                }

                filesystems {
                    gcs {
                        # A reference to a potentially different auth for manipulating files via engine functions.
                        auth = "application-default"

                        # Google project which will be billed for the requests
                        project = "mgh-lilab-archive"

                        caching {
                            # When a cache hit is found, the following duplication strategy will be followed to use the cached outputs
                            # Possible values: "copy", "reference". Defaults to "copy"
                            # "copy": Copy the output files
                            # "reference": DO NOT copy the output files but point to the original output files instead.
                            #              Will still make sure than all the original output files exist and are accessible before
                            #              going forward with the cache hit.
                            duplication-strategy = "copy"
                        }
                    }
                }

                default-runtime-attributes {
                    cpu: 1
                    failOnStderr: false
                    continueOnReturnCode: 0
                    memory: "2048 MB"
                    bootDiskSizeGb: 10
                    # Allowed to be a String, or a list of Strings
                    disks: "local-disk 10 SSD"
                    noAddress: false
                    preemptible: 0
                    zones: ["us-central1-a", "us-central1-b"]
                }

                include "papi_v2_reference_image_manifest.conf"
            }
        }
    }
}

leepc12 commented 3 years ago

What is your caper version (check with caper -v)?

I tried to replicate the error with the latest Caper 1.6.3 but got different one. It passed the JSON parsing error.

2021-07-06 15:03:51,465  INFO  - Reference disks feature for PAPIv2 backend is not configured.
2021-07-06 15:03:51,487  WARN  - Unrecognized configuration key(s) for gcp: localization-attempts
2021-07-06 15:03:51,488  INFO  - Reference disks feature for gcp backend is not configured.
2021-07-06 15:03:51,507  ERROR - Failed to instantiate Cromwell System. Shutting down Cromwell.
common.exception.AggregatedException: :
null
        Google Pipelines API configuration is not valid: Errors:
`google` configuration stanza does not contain an auth named 'service-account'.  Known auth names: application-default
`google` configuration stanza does not contain an auth named 'service-account'.  Known auth names: application-default
        at common.util.TryUtil$.sequenceIterable(TryUtil.scala:29)
        at common.util.TryUtil$.sequenceMap(TryUtil.scala:47)
        at cromwell.engine.backend.CromwellBackends.<init>(CromwellBackends.scala:14)
        at cromwell.engine.backend.CromwellBackends$.initBackends(CromwellBackends.scala:42)
        at cromwell.server.CromwellSystem.$init$(CromwellSystem.scala:68)

yihming commented 3 years ago

Hi @leepc12 ,

I was able to figure out where the error is from. When Caper parses my conf file, for the following line

slow-job-warning-time: 24 hours

it rewrites to

"slow-job-warning-time": relativedelta(days=+1)

in HOCONConverter.to_json(c), which is Line 147 of hocon_string.py. Therefore, this causes the error.

After modifying by adding quote:

slow-job-warning-time: "24 hours"

the error goes away.

Not sure if this needs to be fixed, because Cromwell could process this syntax, while it's always good to add quote to string values with whitespaces.

ENCODE-DCC / caper

Can caper server use a custom Cromwell conf file? #136