ENCODE-DCC / caper

Cromwell/WDL wrapper for Python
MIT License
54 stars 18 forks source link

Cannot build a local path from file link on GCP GS #132

Closed yihming closed 3 years ago

yihming commented 3 years ago

To Whom It May Concern,

I'm running caper v1.6.3 on a GCP instance. When running it using my WDL workflow which contains the following code:

File ref_index_file = "gs://regev-lab/resources/count_tools/ref_index.tsv"
Map[String, String] ref_index2gsurl = read_map(ref_index_file)

Cromwell gave me the error saying

LinuxFileSystem: Cannot build a local path from gs://regev-lab/resources/count_tools/ref_index.tsv
Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems

and

Evaluating read_map(ref_index_file) failed: Failed to read_map("gs://regev-lab/resources/count_tools/ref_index.tsv") (reason 1 of 1): java.lang.IllegalArgumentException: Could not build the path "gs://regev-lab/resources/count_tools/ref_index.tsv". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: HTTP, LinuxFileSystem.

Since it refers to the documentation in HPC backend of Cromwell, I suspect that caper treated my instance as HPC. However, I did run caper init gcp, and it worked properly using my other WDL workflows without read_map function.

The issue went away after I copied the file from GS to the instance, and modified my workflow to refer to the local link.

Do you have any suggestion on fixing this issue. I'm not sure if this issue is related to caper or Cromwell. But since my workflow was executed successfully on Broad Terra platform, which uses Cromwell as the engine and GCP as backends, I guess there is something I may not configure correctlly in caper, instead of Cromwell.

Any help would be appreciated. Thanks!

yihming commented 3 years ago

Below is the detailed error message in cromwell.out log file:

2021-06-28 18:36:32,745  INFO  - Running with database db.url = jdbc:hsqldb:mem:79610c1b-6b2b-4789-8845-4a0612afbcd4;shutdown=false;hsqldb.tx=mvcc
2021-06-28 18:36:45,295  INFO  - Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
2021-06-28 18:36:45,328  INFO  - [RenameWorkflowOptionsInMetadata] 100%
2021-06-28 18:36:45,579  INFO  - Running with database db.url = jdbc:hsqldb:mem:debf75de-2982-4eb6-b7fc-d7d138d04838;shutdown=false;hsqldb.tx=mvcc
2021-06-28 18:36:46,436  WARN  - Unrecognized configuration key(s) for gcp: localization-attempts
2021-06-28 18:36:46,766  INFO  - Reference disks feature for gcp backend is not configured.
2021-06-28 18:36:47,199  INFO  - Slf4jLogger started
2021-06-28 18:36:47,454 cromwell-system-akka.dispatchers.engine-dispatcher-9 INFO  - Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-9531ccf",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "failureShutdownDuration" : "5 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
2021-06-28 18:36:47,624 cromwell-system-akka.dispatchers.service-dispatcher-18 INFO  - Metadata summary refreshing every 1 second.
2021-06-28 18:36:47,706 cromwell-system-akka.dispatchers.engine-dispatcher-29 INFO  - JobStoreWriterActor configured to flush with batch size 1000 and process rate 1 second.
2021-06-28 18:36:47,727 cromwell-system-akka.actor.default-dispatcher-8 INFO  - KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
2021-06-28 18:36:47,735 cromwell-system-akka.dispatchers.engine-dispatcher-28 INFO  - CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
2021-06-28 18:36:47,740 cromwell-system-akka.dispatchers.service-dispatcher-13 INFO  - WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
2021-06-28 18:36:47,743  WARN  - 'docker.hash-lookup.gcr-api-queries-per-100-seconds' is being deprecated, use 'docker.hash-lookup.gcr.throttle' instead (see reference.conf)
2021-06-28 18:36:47,933 cromwell-system-akka.dispatchers.engine-dispatcher-33 INFO  - JobExecutionTokenDispenser - Distribution rate: 1 per 2 seconds.
2021-06-28 18:36:48,030 cromwell-system-akka.dispatchers.backend-dispatcher-35 INFO  - Running with 3 PAPI request workers
2021-06-28 18:36:48,030 cromwell-system-akka.dispatchers.backend-dispatcher-35 INFO  - 'resetAllWorkers()' called to fill vector with 3 new workers
2021-06-28 18:36:48,904 cromwell-system-akka.dispatchers.backend-dispatcher-35 INFO  - Request manager PAPIQueryManager created new PAPI request worker PAPIQueryWorker-e21c92dc-3261-494e-a2d9-503920f516fb with batch interval of 33333 milliseconds
2021-06-28 18:36:48,911 cromwell-system-akka.dispatchers.backend-dispatcher-35 INFO  - Request manager PAPIQueryManager created new PAPI request worker PAPIQueryWorker-580ca6f7-da8d-4037-b094-4168973c135f with batch interval of 33333 milliseconds
2021-06-28 18:36:48,920 cromwell-system-akka.dispatchers.backend-dispatcher-35 INFO  - Request manager PAPIQueryManager created new PAPI request worker PAPIQueryWorker-e6d4bb15-5e91-49e7-aa25-4bdd4a5213fa with batch interval of 33333 milliseconds
2021-06-28 18:36:50,135 cromwell-system-akka.dispatchers.engine-dispatcher-9 INFO  - Cromwell 59 service started on 0:0:0:0:0:0:0:0:8000...
2021-06-28 18:36:52,939 cromwell-system-akka.dispatchers.engine-dispatcher-33 INFO  - Not triggering log of token queue status. Effective log interval = None
2021-06-28 18:37:00,644 cromwell-system-akka.dispatchers.api-dispatcher-42 INFO  - Unspecified type (Unspecified version) workflow c7fb167b-ccb8-4edc-b7ae-823701fab1b2 submitted
2021-06-28 18:37:09,025 cromwell-system-akka.dispatchers.engine-dispatcher-30 INFO  - 1 new workflows fetched by cromid-9531ccf: c7fb167b-ccb8-4edc-b7ae-823701fab1b2
2021-06-28 18:37:09,032 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - WorkflowManagerActor: Starting workflow UUID(c7fb167b-ccb8-4edc-b7ae-823701fab1b2)
2021-06-28 18:37:09,042 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - WorkflowManagerActor: Successfully started WorkflowActor-c7fb167b-ccb8-4edc-b7ae-823701fab1b2
2021-06-28 18:37:09,043 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - Retrieved 1 workflows from the WorkflowStoreActor
2021-06-28 18:37:09,067 cromwell-system-akka.dispatchers.engine-dispatcher-30 INFO  - WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
2021-06-28 18:37:09,232 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - MaterializeWorkflowDescriptorActor [UUID(c7fb167b)]: Parsing workflow as WDL 1.0
2021-06-28 18:37:11,006 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - MaterializeWorkflowDescriptorActor [UUID(c7fb167b)]: Call-to-Backend assignments: starsolo.generate_count_config -> gcp, starsolo.run_star_solo -> gcp
2021-06-28 18:37:14,849 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - WorkflowExecutionActor-c7fb167b-ccb8-4edc-b7ae-823701fab1b2 [UUID(c7fb167b)]: Starting starsolo.generate_count_config
2021-06-28 18:37:15,956 cromwell-system-akka.dispatchers.engine-dispatcher-28 INFO  - Assigned new job execution tokens to the following groups: c7fb167b: 1
2021-06-28 18:37:17,594 cromwell-system-akka.dispatchers.engine-dispatcher-12 INFO  - c7fb167b-ccb8-4edc-b7ae-823701fab1b2-EngineJobExecutionActor-starsolo.generate_count_config:NA:1 [UUID(c7fb167b)]: Could not copy a suitable cache hit for c7fb167b:starsolo.generate_count_config:-1:1. No copy attempts were made.
2021-06-28 18:37:17,943 cromwell-system-akka.dispatchers.backend-dispatcher-36 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(c7fb167b)starsolo.generate_count_config:NA:1]: `set -e
export TMPDIR=/tmp

python <<CODE
import re
import pandas as pd
from subprocess import check_call

df = pd.read_csv('/cromwell_root/mgh-lilab-fileshare/starsolo_test/starsolo_sheet.tsv', sep = '\t', header = 0, dtype = str, index_col = False)
for c in df.columns:
    df[c] = df[c].str.strip()

regex_pat = re.compile('[^a-zA-Z0-9_-]')
if any(df['Sample'].str.contains(regex_pat)):
    print('Sample must contain only alphanumeric characters, hyphens, and underscores.')
    print('Examples of common characters that are not allowed are the space character and the following: ?()[]/\=+<>:;"\',*^| &')
    sys.exit(1)

with open('sample_ids.txt', 'w') as fo1, open('sample_r1.tsv', 'w') as fo2, open('sample_r2.tsv', 'w') as fo3:
    for idx, row in df.iterrows():
        fo1.write(row['Sample'] + '\n')

        if 'Flowcells' in df.columns: # Fetch R1 and R2 fastqs automatically.
            input_dir_list = list(map(lambda x: x.strip(), row['Flowcells'].split(',')))
            r1_list = []
            r2_list = []
            for directory in input_dir_list:
                directory = re.sub('/+$', '', directory)

                call_args = ['gsutil', 'ls', directory]
                # call_args = ['ls', directory]
                with open('list_dir.txt', 'w') as tmp_fo:
                    check_call(call_args, stdout=tmp_fo)

                with open('list_dir.txt', 'r') as tmp_fin:
                    f_list = tmp_fin.readlines()
                    f_list = list(map(lambda s: s.strip(), f_list))

                r1_files = [f for f in f_list if re.match('.*_R1_.*.fastq.gz', f)]
                r2_files = [f for f in f_list if re.match('.*_R2_.*.fastq.gz', f)]
                r1_files.sort()
                r2_files.sort()
                # r1_files = list(map(lambda s: directory+'/'+s, r1_files))
                # r2_files = list(map(lambda s: directory+'/'+s, r2_files))

                r1_list.extend(r1_files)
                r2_list.extend(r2_files)

        else:  # R1 and R2 fastqs specified in sample sheet.
            r1_list = list(map(lambda s: s.strip(), row['R1'].split(',')))
            r2_list = list(map(lambda s: s.strip(), row['R2'].split(',')))

        fo2.write(row['Sample'] + '\t' + ','.join(r1_list) + '\n')
        fo3.write(row['Sample'] + '\t' + ','.join(r2_list) + '\n')
CODE`
2021-06-28 18:37:19,827 cromwell-system-akka.dispatchers.backend-dispatcher-34 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(c7fb167b)starsolo.generate_count_config:NA:1]: Adjusting boot disk size to 12 GB: 10 GB (runtime attributes) + 1 GB (user command image) + 1 GB (Cromwell support images)
2021-06-28 18:37:27,770 cromwell-system-akka.dispatchers.backend-dispatcher-36 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(c7fb167b)starsolo.generate_count_config:NA:1]: job id: projects/618777040679/locations/us-central1/operations/3605108356742585627
2021-06-28 18:37:55,963 cromwell-system-akka.dispatchers.backend-dispatcher-36 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(c7fb167b)starsolo.generate_count_config:NA:1]: Status change from - to Running
2021-06-28 18:40:43,179 cromwell-system-akka.dispatchers.backend-dispatcher-36 INFO  - PipelinesApiAsyncBackendJobExecutionActor [UUID(c7fb167b)starsolo.generate_count_config:NA:1]: Status change from Running to Success
2021-06-28 18:40:46,285 cromwell-system-akka.dispatchers.engine-dispatcher-33 INFO  - WorkflowManagerActor: Workflow c7fb167b-ccb8-4edc-b7ae-823701fab1b2 failed (during ExecutingWorkflowState): java.lang.RuntimeException: Failed to evaluate 'ref_index2gsurl' (reason 1 of 1): Evaluating read_map(ref_index_file) failed: Failed to read_map("gs://regev-lab/resources/count_tools/ref_index.tsv") (reason 1 of 1): java.lang.IllegalArgumentException: Could not build the path "gs://regev-lab/resources/count_tools/ref_index.tsv". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: HTTP, LinuxFileSystem. Failures: 
HTTP: gs://regev-lab/resources/count_tools/ref_index.tsv does not have an http or https scheme (IllegalArgumentException)
LinuxFileSystem: Cannot build a local path from gs://regev-lab/resources/count_tools/ref_index.tsv (RuntimeException)
 Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
    at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:29)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$7(WorkflowExecutionActor.scala:564)
    at scala.collection.immutable.List.map(List.scala:290)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:558)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:210)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:208)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
    at akka.actor.FSM.processEvent(FSM.scala:710)
    at akka.actor.FSM.processEvent$(FSM.scala:704)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:54)
    at akka.actor.LoggingFSM.processEvent(FSM.scala:847)
    at akka.actor.LoggingFSM.processEvent$(FSM.scala:829)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:54)
    at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:701)
    at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:695)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$receive$1.applyOrElse(WorkflowExecutionActor.scala:508)
    at akka.actor.Actor.aroundReceive(Actor.scala:539)
    at akka.actor.Actor.aroundReceive$(Actor.scala:537)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:54)
    at akka.actor.Timers.aroundReceive(Timers.scala:51)
    at akka.actor.Timers.aroundReceive$(Timers.scala:40)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:54)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614)
    at akka.actor.ActorCell.invoke(ActorCell.scala:583)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
    at akka.dispatch.Mailbox.run(Mailbox.scala:229)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:241)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

java.lang.RuntimeException: Failed to evaluate 'wl_index2gsurl' (reason 1 of 1): Evaluating read_map(wl_index_file) failed: Failed to read_map("gs://regev-lab/resources/count_tools/whitelist_index.tsv") (reason 1 of 1): java.lang.IllegalArgumentException: Could not build the path "gs://regev-lab/resources/count_tools/whitelist_index.tsv". It may refer to a filesystem not supported by this instance of Cromwell. Supported filesystems are: HTTP, LinuxFileSystem. Failures: 
HTTP: gs://regev-lab/resources/count_tools/whitelist_index.tsv does not have an http or https scheme (IllegalArgumentException)
LinuxFileSystem: Cannot build a local path from gs://regev-lab/resources/count_tools/whitelist_index.tsv (RuntimeException)
 Please refer to the documentation for more information on how to configure filesystems: http://cromwell.readthedocs.io/en/develop/backends/HPC/#filesystems
    at cromwell.engine.workflow.lifecycle.execution.keys.ExpressionKey.processRunnable(ExpressionKey.scala:29)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.$anonfun$startRunnableNodes$7(WorkflowExecutionActor.scala:564)
    at scala.collection.immutable.List.map(List.scala:290)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.cromwell$engine$workflow$lifecycle$execution$WorkflowExecutionActor$$startRunnableNodes(WorkflowExecutionActor.scala:558)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:210)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$5.applyOrElse(WorkflowExecutionActor.scala:208)
    at scala.PartialFunction$OrElse.apply(PartialFunction.scala:172)
    at akka.actor.FSM.processEvent(FSM.scala:710)
    at akka.actor.FSM.processEvent$(FSM.scala:704)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$LoggingFSM$$super$processEvent(WorkflowExecutionActor.scala:54)
    at akka.actor.LoggingFSM.processEvent(FSM.scala:847)
    at akka.actor.LoggingFSM.processEvent$(FSM.scala:829)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.processEvent(WorkflowExecutionActor.scala:54)
    at akka.actor.FSM.akka$actor$FSM$$processMsg(FSM.scala:701)
    at akka.actor.FSM$$anonfun$receive$1.applyOrElse(FSM.scala:695)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:38)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor$$anonfun$receive$1.applyOrElse(WorkflowExecutionActor.scala:508)
    at akka.actor.Actor.aroundReceive(Actor.scala:539)
    at akka.actor.Actor.aroundReceive$(Actor.scala:537)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.akka$actor$Timers$$super$aroundReceive(WorkflowExecutionActor.scala:54)
    at akka.actor.Timers.aroundReceive(Timers.scala:51)
    at akka.actor.Timers.aroundReceive$(Timers.scala:40)
    at cromwell.engine.workflow.lifecycle.execution.WorkflowExecutionActor.aroundReceive(WorkflowExecutionActor.scala:54)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:614)
    at akka.actor.ActorCell.invoke(ActorCell.scala:583)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:268)
    at akka.dispatch.Mailbox.run(Mailbox.scala:229)
    at akka.dispatch.Mailbox.exec(Mailbox.scala:241)
    at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

2021-06-28 18:40:47,759 cromwell-system-akka.dispatchers.engine-dispatcher-28 INFO  - WorkflowManagerActor: Workflow actor for c7fb167b-ccb8-4edc-b7ae-823701fab1b2 completed with status 'Failed'. The workflow will be removed from the workflow store.
leepc12 commented 3 years ago

Can you remove gs:// URIs from your WDL and define it in your input JSON instead? Make sure that you use version 1.0 WDL.

version 1.0

workflow your_workflow {
    input {
        File ref_index_file
    }
    ...
}
yihming commented 3 years ago

Thank you for your response!

My workflow is indeed in version 1.0. And I was able to reproduce it when directly running Cromwell in PAPIv2 backend. So I'm switch to Cromwell to report this issue.

yihming commented 3 years ago

Adding the following lines to the conf file when executing resolves the issue:

engine {
    filesystems {
        gcs {
            auth = "application-default"
            project = "<google-billing-project-name>"
        }
    }
}