broadinstitute / cromwell

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments
http://cromwell.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
996 stars 361 forks source link

Call-caching excessively slow #6971

Closed LiterallyUniqueLogin closed 1 year ago

LiterallyUniqueLogin commented 1 year ago

Hi there,

I'm new to both WDL and Cromwell and am trying to get an analysis pipeline up and running. I'm using call-caching to speed up my development, so that I don't have to repeat multi-hour steps. However, I'm currently seeing ~8 minute delays for processing cache hits. With multiple steps in serial, this means that nothing in my pipeline starts running till 14 minutes after I start the run. Can you help me fix that?

Thank you for the help!

Happy to provide any more info than the below if that's helpful.

I'm running with cromwell 84.

Here's the command I'm running java -Dconfig.file=workflow/cromwell.conf -jar utilities/cromwell-84.jar run workflow/expanse_workflow.wdl

Here's my configuration (ignore the SLURM part, I'm not using it yet). Potentially important bits:

system { abort-jobs-on-terminate = true io { number-of-requests = 30 per = 1 second } }

file based persistent database

database { profile = "slick.jdbc.HsqldbProfile$" db { driver = "org.hsqldb.jdbcDriver" url = """ jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db; shutdown=false; hsqldb.default_table_type=cached;hsqldb.tx=mvcc; hsqldb.result_max_memory_rows=10000; hsqldb.large_data=true; hsqldb.applog=1; hsqldb.lob_compressed=true; hsqldb.script_format=3 """ connectionTimeout = 120000 numThreads = 1 } }

call-caching { enabled = true }

backend { default = "Local" providers {
Local { actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" config { concurrent-job-limit = 10 run-in-background = true submit = "/usr/bin/env bash ${script}" root = "cromwell-executions" filesystems { local { localization: ["soft-link"] caching { duplication-strategy: ["soft-link"] hasing-strategy: ["path+modtime"] } } } } } SLURM { actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory" config { concurrent-job-limit = 500 runtime-attributes = """ Int threads = 1 String memory = "2g" String dx_timeout """ submit = """ sbatch --account --partition ind-shared --nodes 1 --job-name=${job_name}-%j

--output=logs/{job_name}/$j.out

    -o ${out} -e ${err} 
        --mail-type FAIL --mail-user <email-address>
        --ntasks-per-node=${threads}
        --mem=${memory}
        --time=${dx_timeout}
        --parsable
        --chdir ${cwd}
    """
    kill = "scancel ${job_id}"
    check-alive = "squeue -j ${job_id}"
    job-id-regex = "Submitted batch job (\\d+).*"
  }
}

}}


Here's the log printed to the terminal. Notice the jump from [2022-12-15 21:15:03,84] to [2022-12-15 21:22:59,01]

$ java -Dconfig.file=workflow/cromwell.conf -jar utilities/cromwell-84.jar run workflow/expanse_workflow.wdl [2022-12-15 21:14:44,99] [info] Running with database db.url = jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db; shutdown=false; hsqldb.default_table_type=cached;hsqldb.tx=mvcc; hsqldb.result_max_memory_rows=10000; hsqldb.large_data=true; hsqldb.applog=1; hsqldb.lob_compressed=true; hsqldb.script_format=3

[2022-12-15 21:14:45,71] [info] dataFileCache open start [2022-12-15 21:14:45,74] [info] dataFileCache open end [2022-12-15 21:14:46,59] [info] checkpointClose start [2022-12-15 21:14:46,59] [info] checkpointClose synched [2022-12-15 21:14:46,71] [info] checkpointClose script done [2022-12-15 21:14:46,71] [info] dataFileCache commit start [2022-12-15 21:14:47,14] [info] dataFileCache commit end [2022-12-15 21:14:47,20] [info] checkpointClose end [2022-12-15 21:14:47,37] [info] Checkpoint start [2022-12-15 21:14:47,37] [info] checkpointClose start [2022-12-15 21:14:47,37] [info] checkpointClose synched [2022-12-15 21:14:47,44] [info] checkpointClose script done [2022-12-15 21:14:47,44] [info] dataFileCache commit start [2022-12-15 21:14:47,45] [info] dataFileCache commit end [2022-12-15 21:14:47,48] [info] checkpointClose end [2022-12-15 21:14:47,48] [info] Checkpoint end - txts: 101676 [2022-12-15 21:14:47,72] [info] Checkpoint start [2022-12-15 21:14:47,72] [info] checkpointClose start [2022-12-15 21:14:47,72] [info] checkpointClose synched [2022-12-15 21:14:47,78] [info] checkpointClose script done [2022-12-15 21:14:47,78] [info] dataFileCache commit start [2022-12-15 21:14:47,79] [info] dataFileCache commit end [2022-12-15 21:14:47,84] [info] checkpointClose end [2022-12-15 21:14:47,84] [info] Checkpoint end - txts: 101746 [2022-12-15 21:14:47,84] [info] Checkpoint start [2022-12-15 21:14:47,84] [info] checkpointClose start [2022-12-15 21:14:47,84] [info] checkpointClose synched [2022-12-15 21:14:47,89] [info] checkpointClose script done [2022-12-15 21:14:47,89] [info] dataFileCache commit start [2022-12-15 21:14:47,90] [info] dataFileCache commit end [2022-12-15 21:14:47,92] [info] checkpointClose end [2022-12-15 21:14:47,93] [info] Checkpoint end - txts: 101748 [2022-12-15 21:14:49,99] [info] Checkpoint start [2022-12-15 21:14:49,99] [info] checkpointClose start [2022-12-15 21:14:49,99] [info] checkpointClose synched [2022-12-15 21:14:50,05] [info] checkpointClose script done [2022-12-15 21:14:50,06] [info] dataFileCache commit start [2022-12-15 21:14:50,06] [info] dataFileCache commit end [2022-12-15 21:14:50,08] [info] checkpointClose end [2022-12-15 21:14:50,09] [info] Checkpoint end - txts: 101803 [2022-12-15 21:14:50,10] [info] Checkpoint start [2022-12-15 21:14:50,10] [info] checkpointClose start [2022-12-15 21:14:50,10] [info] checkpointClose synched [2022-12-15 21:14:50,18] [info] checkpointClose script done [2022-12-15 21:14:50,18] [info] dataFileCache commit start [2022-12-15 21:14:50,18] [info] dataFileCache commit end [2022-12-15 21:14:50,21] [info] checkpointClose end [2022-12-15 21:14:50,21] [info] Checkpoint end - txts: 101866 [2022-12-15 21:14:50,52] [info] Checkpoint start [2022-12-15 21:14:50,52] [info] checkpointClose start [2022-12-15 21:14:50,52] [info] checkpointClose synched [2022-12-15 21:14:50,57] [info] checkpointClose script done [2022-12-15 21:14:50,57] [info] dataFileCache commit start [2022-12-15 21:14:50,57] [info] dataFileCache commit end [2022-12-15 21:14:50,60] [info] checkpointClose end [2022-12-15 21:14:50,60] [info] Checkpoint end - txts: 101868 [2022-12-15 21:14:50,61] [info] Checkpoint start [2022-12-15 21:14:50,61] [info] checkpointClose start [2022-12-15 21:14:50,61] [info] checkpointClose synched [2022-12-15 21:14:50,69] [info] checkpointClose script done [2022-12-15 21:14:50,69] [info] dataFileCache commit start [2022-12-15 21:14:50,70] [info] dataFileCache commit end [2022-12-15 21:14:50,73] [info] checkpointClose end [2022-12-15 21:14:50,74] [info] Checkpoint end - txts: 101875 [2022-12-15 21:14:50,74] [info] Checkpoint start [2022-12-15 21:14:50,74] [info] checkpointClose start [2022-12-15 21:14:50,74] [info] checkpointClose synched [2022-12-15 21:14:50,78] [info] checkpointClose script done [2022-12-15 21:14:50,78] [info] dataFileCache commit start [2022-12-15 21:14:50,78] [info] dataFileCache commit end [2022-12-15 21:14:50,80] [info] checkpointClose end [2022-12-15 21:14:50,81] [info] Checkpoint end - txts: 101877 [2022-12-15 21:14:50,81] [info] Checkpoint start [2022-12-15 21:14:50,81] [info] checkpointClose start [2022-12-15 21:14:50,81] [info] checkpointClose synched [2022-12-15 21:14:50,85] [info] checkpointClose script done [2022-12-15 21:14:50,85] [info] dataFileCache commit start [2022-12-15 21:14:50,85] [info] dataFileCache commit end [2022-12-15 21:14:50,87] [info] checkpointClose end [2022-12-15 21:14:50,88] [info] Checkpoint end - txts: 101879 [2022-12-15 21:14:50,89] [info] Running with database db.url = jdbc:hsqldb:file:cromwell-executions/cromwell-db/cromwell-db; shutdown=false; hsqldb.default_table_type=cached;hsqldb.tx=mvcc; hsqldb.result_max_memory_rows=10000; hsqldb.large_data=true; hsqldb.applog=1; hsqldb.lob_compressed=true; hsqldb.script_format=3

[2022-12-15 21:14:50,95] [info] Checkpoint start [2022-12-15 21:14:50,95] [info] checkpointClose start [2022-12-15 21:14:50,95] [info] checkpointClose synched [2022-12-15 21:14:50,98] [info] checkpointClose script done [2022-12-15 21:14:50,98] [info] dataFileCache commit start [2022-12-15 21:14:50,99] [info] dataFileCache commit end [2022-12-15 21:14:51,01] [info] checkpointClose end [2022-12-15 21:14:51,02] [info] Checkpoint end - txts: 101887 [2022-12-15 21:14:51,05] [info] Checkpoint start [2022-12-15 21:14:51,05] [info] checkpointClose start [2022-12-15 21:14:51,06] [info] checkpointClose synched [2022-12-15 21:14:51,08] [info] checkpointClose script done [2022-12-15 21:14:51,08] [info] dataFileCache commit start [2022-12-15 21:14:51,31] [info] dataFileCache commit end [2022-12-15 21:14:51,35] [info] checkpointClose end [2022-12-15 21:14:51,35] [info] Checkpoint end - txts: 101957 [2022-12-15 21:14:51,35] [info] Checkpoint start [2022-12-15 21:14:51,35] [info] checkpointClose start [2022-12-15 21:14:51,35] [info] checkpointClose synched [2022-12-15 21:14:51,38] [info] checkpointClose script done [2022-12-15 21:14:51,38] [info] dataFileCache commit start [2022-12-15 21:14:51,38] [info] dataFileCache commit end [2022-12-15 21:14:51,41] [info] checkpointClose end [2022-12-15 21:14:51,41] [info] Checkpoint end - txts: 101959 [2022-12-15 21:14:51,63] [info] Checkpoint start [2022-12-15 21:14:51,63] [info] checkpointClose start [2022-12-15 21:14:51,63] [info] checkpointClose synched [2022-12-15 21:14:51,67] [info] checkpointClose script done [2022-12-15 21:14:51,67] [info] dataFileCache commit start [2022-12-15 21:14:51,68] [info] dataFileCache commit end [2022-12-15 21:14:51,70] [info] checkpointClose end [2022-12-15 21:14:51,71] [info] Checkpoint end - txts: 102014 [2022-12-15 21:14:51,72] [info] Checkpoint start [2022-12-15 21:14:51,72] [info] checkpointClose start [2022-12-15 21:14:51,72] [info] checkpointClose synched [2022-12-15 21:14:51,76] [info] checkpointClose script done [2022-12-15 21:14:51,76] [info] dataFileCache commit start [2022-12-15 21:14:51,76] [info] dataFileCache commit end [2022-12-15 21:14:51,79] [info] checkpointClose end [2022-12-15 21:14:51,79] [info] Checkpoint end - txts: 102077 [2022-12-15 21:14:51,80] [info] Checkpoint start [2022-12-15 21:14:51,80] [info] checkpointClose start [2022-12-15 21:14:51,80] [info] checkpointClose synched [2022-12-15 21:14:51,85] [info] checkpointClose script done [2022-12-15 21:14:51,85] [info] dataFileCache commit start [2022-12-15 21:14:51,85] [info] dataFileCache commit end [2022-12-15 21:14:51,88] [info] checkpointClose end [2022-12-15 21:14:51,88] [info] Checkpoint end - txts: 102079 [2022-12-15 21:14:51,89] [info] Checkpoint start [2022-12-15 21:14:51,89] [info] checkpointClose start [2022-12-15 21:14:51,89] [info] checkpointClose synched [2022-12-15 21:14:51,95] [info] checkpointClose script done [2022-12-15 21:14:51,95] [info] dataFileCache commit start [2022-12-15 21:14:51,96] [info] dataFileCache commit end [2022-12-15 21:14:51,99] [info] checkpointClose end [2022-12-15 21:14:51,99] [info] Checkpoint end - txts: 102086 [2022-12-15 21:14:51,99] [info] Checkpoint start [2022-12-15 21:14:51,99] [info] checkpointClose start [2022-12-15 21:14:51,99] [info] checkpointClose synched [2022-12-15 21:14:52,03] [info] checkpointClose script done [2022-12-15 21:14:52,03] [info] dataFileCache commit start [2022-12-15 21:14:52,04] [info] dataFileCache commit end [2022-12-15 21:14:52,42] [info] checkpointClose end [2022-12-15 21:14:52,43] [info] Checkpoint end - txts: 102088 [2022-12-15 21:14:52,43] [info] Checkpoint start [2022-12-15 21:14:52,43] [info] checkpointClose start [2022-12-15 21:14:52,43] [info] checkpointClose synched [2022-12-15 21:14:52,46] [info] checkpointClose script done [2022-12-15 21:14:52,46] [info] dataFileCache commit start [2022-12-15 21:14:52,46] [info] dataFileCache commit end [2022-12-15 21:14:52,49] [info] checkpointClose end [2022-12-15 21:14:52,50] [info] Checkpoint end - txts: 102090 [2022-12-15 21:14:52,81] [info] Slf4jLogger started [2022-12-15 21:14:53,15] [info] Workflow heartbeat configuration: { "cromwellId" : "cromid-b254006", "heartbeatInterval" : "2 minutes", "ttl" : "10 minutes", "failureShutdownDuration" : "5 minutes", "writeBatchSize" : 10000, "writeThreshold" : 10000 } [2022-12-15 21:14:53,38] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds. [2022-12-15 21:14:53,44] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds. [2022-12-15 21:14:53,44] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds. [2022-12-15 21:14:53,44] [info] Metadata summary refreshing every 1 second. [2022-12-15 21:14:53,44] [info] No metadata archiver defined in config [2022-12-15 21:14:53,44] [info] No metadata deleter defined in config [2022-12-15 21:14:53,55] [info] JobRestartCheckTokenDispenser - Distribution rate: 50 per 1 seconds. [2022-12-15 21:14:53,66] [info] JobExecutionTokenDispenser - Distribution rate: 20 per 10 seconds. [2022-12-15 21:14:53,78] [info] SingleWorkflowRunnerActor: Version 84 [2022-12-15 21:14:53,82] [info] SingleWorkflowRunnerActor: Submitting workflow [2022-12-15 21:14:53,93] [info] Unspecified type (Unspecified version) workflow 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff submitted [2022-12-15 21:14:53,94] [info] SingleWorkflowRunnerActor: Workflow submitted 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [2022-12-15 21:14:53,96] [info] 1 new workflows fetched by cromid-b254006: 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [2022-12-15 21:14:53,96] [info] WorkflowManagerActor: Starting workflow 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [2022-12-15 21:14:53,98] [info] WorkflowManagerActor: Successfully started WorkflowActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [2022-12-15 21:14:53,98] [info] Retrieved 1 workflows from the WorkflowStoreActor [2022-12-15 21:14:53,99] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes. [2022-12-15 21:14:54,14] [info] MaterializeWorkflowDescriptorActor [9e4f5894]: Parsing workflow as WDL 1.0 [2022-12-15 21:14:56,25] [info] MaterializeWorkflowDescriptorActor [9e4f5894]: Call-to-Backend assignments: main.ethnic_sample_lists_task -> Local, main.kinship_count -> Local, main.reported_sex -> Local, main.phenotype -> Local, main .load_continuous_phenotype -> Local, main.pcs -> Local, main.load_binary_phenotype -> Local, main.year_of_birth -> Local, main.all_qced_sample_lists -> Local, main.sex_aneuploidy -> Local, main.white_brits_sample_list -> Local, main.m onth_of_birth -> Local, main.date_of_death -> Local, main.genetic_sex -> Local, main.sex_aneuploidy_sample_list -> Local, main.white_brits -> Local, main.ethnicity_self_report -> Local, main.assessment_ages -> Local, main.low_genotypi ng_quality_sample_list -> Local, main.categorical_covariates -> Local, main.sex_mismatch_sample_list -> Local, main.load_shared_covars -> Local [2022-12-15 21:14:56,51] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,51] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,51] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,51] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,51] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,52] [warn] Local [9e4f5894]: Key/s [dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,53] [warn] Local [9e4f5894]: Key/s [shortTask, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:56,53] [warn] Local [9e4f5894]: Key/s [memory, dx_timeout] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2022-12-15 21:14:58,56] [info] Not triggering log of restart checking token queue status. Effective log interval = None [2022-12-15 21:14:58,68] [info] Not triggering log of execution token queue status. Effective log interval = None [2022-12-15 21:14:58,68] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Starting main.kinship_count, main.reported_sex, main.phenotype, main.month_of_birth, main.pcs, main.white_brits, main.year_of_birt h, main.sex_aneuploidy, main.date_of_death, main.genetic_sex, main.ethnicity_self_report, main.assessment_ages [2022-12-15 21:15:00,77] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Starting main.categorical_covariates [2022-12-15 21:15:03,68] [info] Assigned new job execution tokens to the following groups: 9e4f5894: 10 [2022-12-15 21:15:03,81] [info] BT-322 9e4f5894:main.phenotype:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,81] [info] BT-322 9e4f5894:main.white_brits:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,83] [info] BT-322 9e4f5894:main.month_of_birth:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.date_of_death:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.assessment_ages:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.year_of_birth:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.kinship_count:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.reported_sex:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.genetic_sex:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:15:03,84] [info] BT-322 9e4f5894:main.sex_aneuploidy:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:22:59,01] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.kinship_count:-1:1-20000000009 [9e4f5894main.kinship_count:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:22:59,03] [info] BT-322 9e4f5894:main.kinship_count:-1:1 cache hit copying success with aggregated hashes: initial = 40DB3965745EAB4613A3E2804F447EFE, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:22:59,03] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.kinship_count:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:22:59,12] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.reported_sex:-1:1-20000000001 [9e4f5894main.reported_sex:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:22:59,12] [info] BT-322 9e4f5894:main.reported_sex:-1:1 cache hit copying success with aggregated hashes: initial = 91C81CABBB083C238800E3CF59AF537D, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:22:59,12] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.reported_sex:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:22:59,16] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.sex_aneuploidy:-1:1-20000000003 [9e4f5894main.sex_aneuploidy:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:22:59,17] [info] BT-322 9e4f5894:main.sex_aneuploidy:-1:1 cache hit copying success with aggregated hashes: initial = 86896541F0DCB2C2B959EEF37F266B30, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:22:59,17] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.sex_aneuploidy:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:22:59,32] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.month_of_birth:-1:1-20000000024 [9e4f5894main.month_of_birth:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:22:59,32] [info] BT-322 9e4f5894:main.month_of_birth:-1:1 cache hit copying success with aggregated hashes: initial = 601F8C709AA96517AA171B340CCA88BF, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:22:59,32] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.month_of_birth:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:22:59,78] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.kinship_count' (scatter index: None, attempt 1) [2022-12-15 21:22:59,78] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.reported_sex' (scatter index: None, attempt 1) [2022-12-15 21:22:59,78] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.sex_aneuploidy' (scatter index: None, attempt 1) [2022-12-15 21:22:59,78] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.month_of_birth' (scatter index: None, attempt 1) [2022-12-15 21:22:59,84] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.year_of_birth:-1:1-20000000028 [9e4f5894main.year_of_birth:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:22:59,84] [info] BT-322 9e4f5894:main.year_of_birth:-1:1 cache hit copying success with aggregated hashes: initial = 09247459DDA5EA8DF661D5F490C81E8B, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:22:59,84] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.year_of_birth:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:23:00,36] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.phenotype:-1:1-20000000025 [9e4f5894main.phenotype:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:23:00,36] [info] BT-322 9e4f5894:main.phenotype:-1:1 cache hit copying success with aggregated hashes: initial = 018D1BC619E22671C2125EEDE82AB210, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:23:00,36] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.phenotype:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:23:00,37] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.date_of_death:-1:1-20000000026 [9e4f5894main.date_of_death:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:23:00,37] [info] BT-322 9e4f5894:main.date_of_death:-1:1 cache hit copying success with aggregated hashes: initial = 179EA0EE9B87629C24E64D33DEB38610, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:23:00,37] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.date_of_death:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:23:00,67] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.white_brits:-1:1-20000000000 [9e4f5894main.white_brits:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:23:00,68] [info] BT-322 9e4f5894:main.white_brits:-1:1 cache hit copying success with aggregated hashes: initial = EB2F16A657136E0208581A7B6A7F020F, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:23:00,68] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.white_brits:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:23:02,52] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.year_of_birth' (scatter index: None, attempt 1) [2022-12-15 21:23:02,52] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.phenotype' (scatter index: None, attempt 1) [2022-12-15 21:23:02,52] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.date_of_death' (scatter index: None, attempt 1) [2022-12-15 21:23:02,52] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.white_brits' (scatter index: None, attempt 1) [2022-12-15 21:23:03,67] [info] Assigned new job execution tokens to the following groups: 9e4f5894: 3 [2022-12-15 21:23:03,69] [info] BT-322 9e4f5894:main.categorical_covariates:0:1 is eligible for call caching with read = true and write = true [2022-12-15 21:23:03,70] [info] BT-322 9e4f5894:main.ethnicity_self_report:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:23:03,70] [info] BT-322 9e4f5894:main.pcs:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:27:50,35] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.assessment_ages:-1:1-20000000002 [9e4f5894main.assessment_ages:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:27:50,35] [info] BT-322 9e4f5894:main.assessment_ages:-1:1 cache hit copying success with aggregated hashes: initial = EEC3507DAE39FE605FDE6F9F6FC0A5A8, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:27:50,35] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.assessment_ages:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:27:50,48] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.assessment_ages' (scatter index: None, attempt 1) [2022-12-15 21:27:50,82] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.genetic_sex:-1:1-20000000011 [9e4f5894main.genetic_sex:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:27:50,82] [info] BT-322 9e4f5894:main.genetic_sex:-1:1 cache hit copying success with aggregated hashes: initial = FD7DC79B974CF6706FC3376F067965B9, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:27:50,82] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.genetic_sex:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:27:54,15] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.genetic_sex' (scatter index: None, attempt 1) [2022-12-15 21:27:55,79] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.categorical_covariates:0:1-20000000027 [9e4f5894main.categorical_covariates:0:1]: Unrecognized runtime attribute keys: dx_t imeout [2022-12-15 21:27:55,79] [info] BT-322 9e4f5894:main.categorical_covariates:0:1 cache hit copying success with aggregated hashes: initial = C760DC2B9015D0B787EF7BEE7D21AA58, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:27:55,79] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.categorical_covariates:0:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:27:55,79] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.pcs:-1:1-20000000010 [9e4f5894main.pcs:NA:1]: Unrecognized runtime attribute keys: dx_timeout [2022-12-15 21:27:55,79] [info] BT-322 9e4f5894:main.pcs:-1:1 cache hit copying success with aggregated hashes: initial = 58D108557F21E539CF9BE064A9528392, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:27:55,79] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.pcs:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:27:56,12] [warn] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-BackendCacheHitCopyingActor-9e4f5894:main.ethnicity_self_report:-1:1-20000000008 [9e4f5894main.ethnicity_self_report:NA:1]: Unrecognized runtime attribute keys: dx_t imeout [2022-12-15 21:27:56,12] [info] BT-322 9e4f5894:main.ethnicity_self_report:-1:1 cache hit copying success with aggregated hashes: initial = A32F403CF4C1AEE5AC6D327D9290D15E, file = EF056BD27B3A512F77663A400D778CCF. [2022-12-15 21:27:56,12] [info] 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff-EngineJobExecutionActor-main.ethnicity_self_report:NA:1 [9e4f5894]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:27:56,51] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.categorical_covariates' (scatter index: Some(0), attempt 1) [2022-12-15 21:27:56,51] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.pcs' (scatter index: None, attempt 1) [2022-12-15 21:27:56,51] [info] WorkflowExecutionActor-9e4f5894-f7e6-4e2f-be4b-f547d6de7fff [9e4f5894]: Job results retrieved (CallCached): 'main.ethnicity_self_report' (scatter index: None, attempt 1) [2022-12-15 21:28:01,17] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Starting main.white_brits_sample_list, main.sex_aneuploidy_sample_list, main.low_genotyping_quality_sample_list, m ain.sex_mismatch_sample_list, main.load_shared_covars [2022-12-15 21:28:03,68] [info] Assigned new job execution tokens to the following groups: 9e4f5894: 5 [2022-12-15 21:28:03,69] [info] BT-322 788d8048:main.low_genotyping_quality_sample_list:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:03,70] [info] BT-322 788d8048:main.sex_aneuploidy_sample_list:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:03,70] [info] BT-322 788d8048:main.sex_mismatch_sample_list:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:03,70] [info] BT-322 788d8048:main.load_shared_covars:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:03,70] [info] BT-322 788d8048:main.white_brits_sample_list:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:03,72] [info] BT-322 788d8048:main.load_shared_covars:-1:1 cache hit copying nomatch: could not find a suitable cache hit. [2022-12-15 21:28:03,72] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.load_shared_covars:NA:1 [788d8048]: Could not copy a suitable cache hit for 788d8048:main.load_shared_covars:-1:1. No copy attempts were made. [2022-12-15 21:28:03,88] [warn] BackgroundConfigAsyncJobExecutionActor [788d8048main.load_shared_covars:NA:1]: Unrecognized runtime attribute keys: dx_timeout, memory [2022-12-15 21:28:04,00] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.sex_aneuploidy_sample_list:-1:1-20000000012 [788d8048main.sex_aneuploidy_sample_list:NA:1]: Unrecognized runtime attribute keys: shortTask, dx_timeout [2022-12-15 21:28:04,00] [info] BT-322 788d8048:main.sex_aneuploidy_sample_list:-1:1 cache hit copying success with aggregated hashes: initial = B2C071CED641A1EB183DE4A4655F45ED, file = DDF9190E939D36D999E513158D534532. [2022-12-15 21:28:04,00] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.sex_aneuploidy_sample_list:NA:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:04,01] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.white_brits_sample_list:-1:1-20000000013 [788d8048main.white_brits_sample_list:NA:1]: Unrecognized runtime attribute keys: shortTask, dx_timeout [2022-12-15 21:28:04,01] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.low_genotyping_quality_sample_list:-1:1-20000000014 [788d8048main.low_genotyping_quality_sample_list:NA:1]: Unrecognized ru ntime attribute keys: shortTask, dx_timeout [2022-12-15 21:28:04,01] [info] BT-322 788d8048:main.white_brits_sample_list:-1:1 cache hit copying success with aggregated hashes: initial = B2C071CED641A1EB183DE4A4655F45ED, file = 9675960412B5394D5D0816ED198FB6EB. [2022-12-15 21:28:04,01] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.white_brits_sample_list:NA:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:04,01] [info] BT-322 788d8048:main.low_genotyping_quality_sample_list:-1:1 cache hit copying success with aggregated hashes: initial = 3C891C9939496580DDF747805F991E06, file = AAFFF98AC7D58B07E7CE25978A906B00. [2022-12-15 21:28:04,01] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.low_genotyping_quality_sample_list:NA:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:04,02] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.sex_mismatch_sample_list:-1:1-20000000015 [788d8048main.sex_mismatch_sample_list:NA:1]: Unrecognized runtime attribute keys : shortTask, dx_timeout [2022-12-15 21:28:04,02] [info] BT-322 788d8048:main.sex_mismatch_sample_list:-1:1 cache hit copying success with aggregated hashes: initial = 03340ED60152B24B7D0988669F47CF2B, file = EB6A9909BDF3705B7BB543E4096DA08A. [2022-12-15 21:28:04,02] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.sex_mismatch_sample_list:NA:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully

/788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-load_shared_covars/inputs/-915037270/load_shared_covars.py . /home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4b-f547d6de7fff/call-main/main/788d80 48-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-load_shared_covars/inputs/-949496038/ukb46122_cal_chr1_v2_s488176.fam /home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4b-f547d6de7fff/call-main/main/788d 8048-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-load_shared_covars/inputs/-1401422240/22009.txt /home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4b-f547d6de7fff/call-main/main/788d8048-ef2b-4d7c-b3cb- 6e04b3cbbdc2/call-load_shared_covars/inputs/-1717412047/21003.txt [2022-12-15 21:28:04,43] [info] BackgroundConfigAsyncJobExecutionActor [788d8048main.load_shared_covars:NA:1]: executing: /usr/bin/env bash /home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4 b-f547d6de7fff/call-main/main/788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-load_shared_covars/execution/script [2022-12-15 21:28:05,80] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.sex_aneuploidy_sample_list' (scatter index: None, attempt 1) [2022-12-15 21:28:05,80] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.white_brits_sample_list' (scatter index: None, attempt 1) [2022-12-15 21:28:05,81] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.low_genotyping_quality_sample_list' (scatter index: None, attempt 1) [2022-12-15 21:28:05,81] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.sex_mismatch_sample_list' (scatter index: None, attempt 1) [2022-12-15 21:28:07,28] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Starting main.ethnic_sample_lists_task [2022-12-15 21:28:08,40] [info] BackgroundConfigAsyncJobExecutionActor [788d8048main.load_shared_covars:NA:1]: job id: 1902061 [2022-12-15 21:28:08,43] [info] BackgroundConfigAsyncJobExecutionActor [788d8048main.load_shared_covars:NA:1]: Status change from - to WaitingForReturnCode [2022-12-15 21:28:13,68] [info] Assigned new job execution tokens to the following groups: 9e4f5894: 1 [2022-12-15 21:28:13,69] [info] BT-322 788d8048:main.ethnic_sample_lists_task:-1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:13,85] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.ethnic_sample_lists_task:-1:1-20000000033 [788d8048main.ethnic_sample_lists_task:NA:1]: Unrecognized runtime attribute keys : shortTask, dx_timeout [2022-12-15 21:28:13,85] [info] BT-322 788d8048:main.ethnic_sample_lists_task:-1:1 cache hit copying success with aggregated hashes: initial = B09218865D7CA13056B00F9F90E94675, file = 66CE4C8C9D1761D150F95616CE84D5F3. [2022-12-15 21:28:13,85] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.ethnic_sample_lists_task:NA:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:14,50] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.ethnic_sample_lists_task' (scatter index: None, attempt 1) [2022-12-15 21:28:20,55] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Starting main.all_qced_sample_lists (6 shards) [2022-12-15 21:28:23,68] [info] Assigned new job execution tokens to the following groups: 9e4f5894: 6 [2022-12-15 21:28:23,69] [info] BT-322 788d8048:main.all_qced_sample_lists:2:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:23,70] [info] BT-322 788d8048:main.all_qced_sample_lists:1:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:23,70] [info] BT-322 788d8048:main.all_qced_sample_lists:5:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:23,71] [info] BT-322 788d8048:main.all_qced_sample_lists:3:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:23,72] [info] BT-322 788d8048:main.all_qced_sample_lists:4:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:23,72] [info] BT-322 788d8048:main.all_qced_sample_lists:0:1 is eligible for call caching with read = true and write = true [2022-12-15 21:28:23,78] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.all_qced_sample_lists:5:1-20000000037 [788d8048main.all_qced_sample_lists:5:1]: Unrecognized runtime attribute keys: shortT ask, dx_timeout [2022-12-15 21:28:23,78] [info] BT-322 788d8048:main.all_qced_sample_lists:5:1 cache hit copying success with aggregated hashes: initial = 8BB8C81C27BFD2533FC9743A70F55DB1, file = 0EDB33C059ED489B1F78F3502B7DB8AC. [2022-12-15 21:28:23,78] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.all_qced_sample_lists:5:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:23,79] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.all_qced_sample_lists:1:1-20000000039 [788d8048main.all_qced_sample_lists:1:1]: Unrecognized runtime attribute keys: shortT ask, dx_timeout [2022-12-15 21:28:23,79] [info] BT-322 788d8048:main.all_qced_sample_lists:1:1 cache hit copying success with aggregated hashes: initial = 8BB8C81C27BFD2533FC9743A70F55DB1, file = DC6FF6846E1CC843B0D79723739936B2. [2022-12-15 21:28:23,79] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.all_qced_sample_lists:1:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:23,80] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.all_qced_sample_lists:3:1-20000000035 [788d8048main.all_qced_sample_lists:3:1]: Unrecognized runtime attribute keys: shortT ask, dx_timeout [2022-12-15 21:28:23,80] [info] BT-322 788d8048:main.all_qced_sample_lists:3:1 cache hit copying success with aggregated hashes: initial = 8BB8C81C27BFD2533FC9743A70F55DB1, file = 262BA6F0AB83375414FCD228C0CC6E47. [2022-12-15 21:28:23,80] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.all_qced_sample_lists:3:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:23,81] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.all_qced_sample_lists:2:1-20000000034 [788d8048main.all_qced_sample_lists:2:1]: Unrecognized runtime attribute keys: shortT ask, dx_timeout [2022-12-15 21:28:23,81] [info] BT-322 788d8048:main.all_qced_sample_lists:2:1 cache hit copying success with aggregated hashes: initial = 8BB8C81C27BFD2533FC9743A70F55DB1, file = 110EC9F4C25BC9902A0E5B3B8EAB2725. [2022-12-15 21:28:23,81] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.all_qced_sample_lists:2:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:23,81] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.all_qced_sample_lists:4:1-20000000036 [788d8048main.all_qced_sample_lists:4:1]: Unrecognized runtime attribute keys: shortT ask, dx_timeout [2022-12-15 21:28:23,82] [info] BT-322 788d8048:main.all_qced_sample_lists:4:1 cache hit copying success with aggregated hashes: initial = 8BB8C81C27BFD2533FC9743A70F55DB1, file = 51C3D11209F9A7985345B2FD76E1C699. [2022-12-15 21:28:23,82] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.all_qced_sample_lists:4:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:23,86] [warn] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-BackendCacheHitCopyingActor-788d8048:main.all_qced_sample_lists:0:1-20000000038 [788d8048main.all_qced_sample_lists:0:1]: Unrecognized runtime attribute keys: shortT ask, dx_timeout [2022-12-15 21:28:23,86] [info] BT-322 788d8048:main.all_qced_sample_lists:0:1 cache hit copying success with aggregated hashes: initial = 8BB8C81C27BFD2533FC9743A70F55DB1, file = 801EC388A847FBAB78685AE96643853A. [2022-12-15 21:28:23,86] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-EngineJobExecutionActor-main.all_qced_sample_lists:0:1 [788d8048]: Call cache hit process had 0 total hit failures before completing successfully [2022-12-15 21:28:26,54] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.all_qced_sample_lists' (scatter index: Some(5), attempt 1) [2022-12-15 21:28:26,54] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.all_qced_sample_lists' (scatter index: Some(1), attempt 1) [2022-12-15 21:28:26,54] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.all_qced_sample_lists' (scatter index: Some(3), attempt 1) [2022-12-15 21:28:26,54] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.all_qced_sample_lists' (scatter index: Some(2), attempt 1) [2022-12-15 21:28:26,55] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.all_qced_sample_lists' (scatter index: Some(4), attempt 1) [2022-12-15 21:28:26,55] [info] 788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2-SubWorkflowActor-SubWorkflow-main:-1:1 [788d8048]: Job results retrieved (CallCached): 'main.all_qced_sample_lists' (scatter index: Some(0), attempt 1) [2022-12-15 21:28:28,51] [info] BackgroundConfigAsyncJobExecutionActor [788d8048main.load_shared_covars:NA:1]: Status change from WaitingForReturnCode to Done [2022-12-15 21:28:33,81] [info] WorkflowManagerActor: Workflow 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff failed (during ExecutingWorkflowState): Job main.load_shared_covars:NA:1 exited with return code 1 which has not been declared as a va lid return code. See 'continueOnReturnCode' runtime attribute for more details. Check the content of stderr for potential additional information: /home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4b-f547d6de7fff/call-main/main/788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-lo ad_shared_covars/execution/stderr. [First 3000 bytes]:Traceback (most recent call last): File "/home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4b-f547d6de7fff/call-main/main/788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-load_shared_covars/inputs/-915037270/load_shared_covars.py", line 87, in load_covars() File "/home/cromwell-executions/main/9e4f5894-f7e6-4e2f-be4b-f547d6de7fff/call-main/main/788d8048-ef2b-4d7c-b3cb-6e04b3cbbdc2/call-load_shared_covars/inputs/-915037270/load_shared_covars.py", line 51, in load_covars assert not np.any(np.isnan(data)) AssertionError

[2022-12-15 21:28:38,49] [info] WorkflowManagerActor: Workflow actor for 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff completed with status 'Failed'. The workflow will be removed from the workflow store. [2022-12-15 21:28:52,23] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'. [2022-12-15 21:28:53,46] [info] Workflow polling stopped [2022-12-15 21:28:53,46] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds [2022-12-15 21:28:53,46] [info] Aborting all running workflows. [2022-12-15 21:28:53,46] [info] 0 workflows released by cromid-b254006 [2022-12-15 21:28:53,47] [info] WorkflowStoreActor stopped [2022-12-15 21:28:53,47] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds [2022-12-15 21:28:53,47] [info] WorkflowLogCopyRouter stopped [2022-12-15 21:28:53,47] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds [2022-12-15 21:28:53,47] [info] JobExecutionTokenDispenser stopped [2022-12-15 21:28:53,47] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds [2022-12-15 21:28:53,47] [info] WorkflowManagerActor: All workflows finished [2022-12-15 21:28:53,47] [info] WorkflowManagerActor stopped [2022-12-15 21:28:53,71] [info] Connection pools shut down [2022-12-15 21:28:53,71] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds [2022-12-15 21:28:53,71] [info] Shutting down JobStoreActor - Timeout = 1800 seconds [2022-12-15 21:28:53,71] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds [2022-12-15 21:28:53,71] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds [2022-12-15 21:28:53,71] [info] Shutting down DockerHashActor - Timeout = 1800 seconds [2022-12-15 21:28:53,71] [info] Shutting down IoProxy - Timeout = 1800 seconds [2022-12-15 21:28:53,71] [info] CallCacheWriteActor Shutting down: 0 queued messages to process [2022-12-15 21:28:53,72] [info] SubWorkflowStoreActor stopped [2022-12-15 21:28:53,72] [info] JobStoreActor stopped [2022-12-15 21:28:53,72] [info] CallCacheWriteActor stopped [2022-12-15 21:28:53,72] [info] IoProxy stopped [2022-12-15 21:28:53,74] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-12-15 21:28:53,74] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-12-15 21:28:53,75] [info] WriteMetadataActor Shutting down: 0 queued messages to process [2022-12-15 21:28:53,75] [info] KvWriteActor Shutting down: 0 queued messages to process [2022-12-15 21:28:53,76] [info] ServiceRegistryActor stopped [2022-12-15 21:28:53,77] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false [2022-12-15 21:28:53,77] [info] DockerHashActor stopped [2022-12-15 21:28:53,80] [info] Database closed [2022-12-15 21:28:53,80] [info] Stream materializer shut down [2022-12-15 21:28:53,80] [info] WDL HTTP import resolver closed Workflow 9e4f5894-f7e6-4e2f-be4b-f547d6de7fff transitioned to state Failed $

LiterallyUniqueLogin commented 1 year ago

I don't know why it would matter, but here's my workflow in case it does. There's a few runtime attributes that are relevant to the dxCompiler which I plan to use later, but now I'm running with Cromwell and aware that it is happily ignoring those.

expanse_workflow.wdl

version 1.0

import "platform_agnostic_workflow.wdl"

task extract_field {
  input {
    String script_dir
    File script = "~{script_dir}/main_dataset/decompress_trait.py"
    File ukbconv = "~{script_dir}/ukb_utilities/ukbconv"
    File encoding = "~{script_dir}/ukb_utilities/encoding.ukb"
    Array[File]+ fields_files = ["main_dataset/raw_data/fields46781.ukb", "main_dataset/raw_data/fields46782.ukb"]
    Array[File]+ enc_files = ["main_dataset/raw_data/ukb46781.enc_ukb", "main_dataset/raw_data/ukb46782.enc_ukb"]

    Int id # data field id
  }

  output {
    File data = "~{id}.txt"
  }

  command <<<
    ~{script} \
      ~{id} \
      ~{id} \
      ~{ukbconv} \
      ~{encoding} \
      --fields-files ~{sep=" " fields_files} \
      --enc-files ~{sep=" " enc_files}
  >>>

  runtime {
    dx_timeout: "5h"
  }
}

workflow main {

  input {
    String script_dir  = "."

    String phenotype_name = "platelet_count"
    Int phenotype_id = 30080
    Array[String] categorical_covariate_names = ["platelet_count_device_id"]
    Array[Int] categorical_covariate_ids = [30083]
    #String phenotype_name
    #Int phenotype_id
    #Array[String] categorical_covariate_names = []
    #Array[Int] categorical_covariate_ids = []
    Boolean is_binary = false
    Boolean is_zero_one_neg_nan = false # different binary encoding
    String date_of_most_recent_first_occurrence_update = "2021-04-01"

    File fam_file = "microarray/ukb46122_cal_chr1_v2_s488176.fam"
    File withdrawn_sample_list = "sample_qc/common_filters/remove/withdrawn.sample"
  }

  call extract_field as white_brits { input:
    script_dir = script_dir,
    id = 22006
  }

  call extract_field as ethnicity_self_report { input :
    script_dir = script_dir,
    id = 21000
  }

  call extract_field as sex_aneuploidy { input:
    script_dir = script_dir,
    id = 22019
  }

  call extract_field as genetic_sex { input:
    script_dir = script_dir,
    id = 22001
  }

  call extract_field as reported_sex { input:
    script_dir = script_dir,
    id = 31
  }

  call extract_field as kinship_count { input:
    script_dir = script_dir,
    id = 22021
  }

  call extract_field as assessment_ages { input :
    script_dir = script_dir,
    id = 21003
  }

  call extract_field as pcs { input :
    script_dir = script_dir,
    id = 22009
  }

  call extract_field as year_of_birth { input :
    script_dir = script_dir,
    id = 34
  }

  call extract_field as month_of_birth { input :
    script_dir = script_dir,
    id = 52
  }

  call extract_field as date_of_death { input :
    script_dir = script_dir,
    id = 40000
  }

  call extract_field as phenotype { input :
    script_dir = script_dir,
    id = phenotype_id
  }

  scatter (categorical_covariate_id in categorical_covariate_ids) {
    call extract_field as categorical_covariates { input :
      script_dir = script_dir,
      id = categorical_covariate_id
    }
  }

  call platform_agnostic_workflow.main { input:
    script_dir = script_dir,

    phenotype_name = phenotype_name,
    categorical_covariate_names = categorical_covariate_names,
    categorical_covariate_scs = categorical_covariates.data,
    is_binary = is_binary,
    is_zero_one_neg_nan = is_zero_one_neg_nan,
    date_of_most_recent_first_occurrence_update = date_of_most_recent_first_occurrence_update,

    fam_file = fam_file, # Could instead create a task for downloading this with ukbgene
    withdrawn_sample_list = withdrawn_sample_list,

    sc_white_brits = white_brits.data,
    sc_ethnicity_self_report = ethnicity_self_report.data,
    sc_sex_aneuploidy = sex_aneuploidy.data,
    sc_genetic_sex = genetic_sex.data,
    sc_reported_sex = reported_sex.data,
    sc_kinship_count = kinship_count.data,
    sc_assessment_ages = assessment_ages.data,
    sc_pcs = pcs.data,
    sc_year_of_birth = year_of_birth.data,
    sc_month_of_birth = month_of_birth.data,
    sc_date_of_death = date_of_death.data,
    sc_phenotype = phenotype.data
  }

    output {
        Array[File] out_sample_lists = main.out_sample_lists
    }
}

platform_agnostic_workflow.wdl

# platform agnostic workflow

version 1.0

import "tasks.wdl"

workflow main {

  input {
    String script_dir

    String phenotype_name
    Array[String] categorical_covariate_names
    Array[File] categorical_covariate_scs
    Boolean is_binary
    Boolean is_zero_one_neg_nan
    String date_of_most_recent_first_occurrence_update

    File fam_file # task for generating this?
    File withdrawn_sample_list

    # data showcase files
    File sc_white_brits
    File sc_ethnicity_self_report
    File sc_sex_aneuploidy
    File sc_genetic_sex
    File sc_reported_sex
    File sc_kinship_count
    File sc_assessment_ages
    File sc_pcs
    File sc_year_of_birth
    File sc_month_of_birth
    File sc_date_of_death
    File sc_phenotype
  }

  call tasks.write_sample_list as white_brits_sample_list { input:
    script_dir = script_dir,
    sc = sc_white_brits
  }

  call tasks.ethnic_sample_lists as ethnic_sample_lists_task { input: 
    script_dir = script_dir,
    white_brits_sample_list = white_brits_sample_list.data,
    sc_ethnicity_self_report = sc_ethnicity_self_report
  }

  Array[String] ethnicities = ethnic_sample_lists_task.ethnicities
  Array[String] all_ethnicities = flatten([['white_british'], ethnicities])
  Array[File] ethnic_sample_lists = ethnic_sample_lists_task.sample_lists
  Array[File] all_sample_lists = flatten([
    [white_brits_sample_list.data], ethnic_sample_lists
  ])

  call tasks.write_sample_list as sex_aneuploidy_sample_list { input:
    script_dir = script_dir,
    sc = sc_sex_aneuploidy
  }

  call tasks.sex_mismatch_sample_list { input:
    script_dir = script_dir,
    sc_genetic_sex = sc_genetic_sex,
    sc_reported_sex = sc_reported_sex
  }

  call tasks.write_sample_list as low_genotyping_quality_sample_list { input:
    script_dir = script_dir,
    sc = sc_kinship_count,
    value = -1
  }

  scatter (sample_list in all_sample_lists) {
    call tasks.qced_sample_list as all_qced_sample_lists { input:
      script_dir = script_dir,
      unqced_sample_list = sample_list,
      withdrawn_sample_list = withdrawn_sample_list,
      sex_aneuploidy_sample_list = sex_aneuploidy_sample_list.data,
      sex_mismatch_sample_list = sex_mismatch_sample_list.data,
      low_genotyping_quality_sample_list = low_genotyping_quality_sample_list.data
    }
  }

  call tasks.load_shared_covars { input:
    script_dir = script_dir,
    fam_file = fam_file,
    sc_pcs = sc_pcs,
    sc_assessment_ages = sc_assessment_ages
  }

  if (!is_binary) {
    call tasks.load_continuous_phenotype { input :
      script_dir = script_dir,
      sc = sc_phenotype,
      qced_sample_list = all_qced_sample_lists.data[0],
      assessment_ages_npy = load_shared_covars.assessment_ages,
      categorical_covariate_names = categorical_covariate_names,
      categorical_covariate_scs = categorical_covariate_scs
    }
  }
  if (is_binary) {
    call tasks.load_binary_phenotype { input:
      script_dir = script_dir,
      sc = sc_phenotype,
      qced_sample_list = all_qced_sample_lists.data[0],
      sc_year_of_birth = sc_year_of_birth,
      sc_month_of_birth = sc_month_of_birth,
      sc_date_of_death = sc_date_of_death,
      date_of_most_recent_first_occurrence_update = date_of_most_recent_first_occurrence_update,
      is_zero_one_neg_nan = is_zero_one_neg_nan
    }
  }
  # regardless of continuous or binary, get the outputs and move on
  File pheno_data = select_first([load_continuous_phenotype.data, load_binary_phenotype.data])
  File covar_names = select_first([load_continuous_phenotype.covar_names, load_binary_phenotype.covar_names])
  File pheno_readme = select_first([load_continuous_phenotype.README, load_binary_phenotype.covar_names])

  output {
    Array[File] out_sample_lists = all_qced_sample_lists.data
    File assessment_ages = load_shared_covars.assessment_ages
    File shared_covars = load_shared_covars.shared_covars
    File shared_covar_names = load_shared_covars.covar_names
    File pheno_data_out = pheno_data
    File covar_names_out = covar_names
    File pheno_readme_out = pheno_readme
  }
}

tasks.wdl

version 1.0

# any input file with a default relative to the script_dir
# needs to be supplied by the user, it won't be the product of another task
# if input files to tasks can be supplied by another tasks output, 
# there will be a comment specifying
# task input files without comments need to be supplied by the user
# see the expanse workflow for where those are on expanse
# exception: sc (data showcase) tasks are labeled by data field id
# but do need to be supplied by the user

# output files from tasks will be commented with the location
# they reside on expanse
# this isn't necessary for understanding/running the WDL, just useful notes for myself
# for transitioning from snakemake to WDL

# sample_list file format
# first line is 'ID' (case insensitive)
# every successive line is a sample ID

# TODO: set container for each task

####################### Loading samples and phenotypes ####################

task write_sample_list {
  input {
    String script_dir
    File script = "~{script_dir}/sample_qc/scripts/write_sample_list.py"

    File sc
    Int? value
  }

  output {
    File data = "data.out"
  }

  command <<<
    ~{script} ~{sc} data.out ~{"--value " +  value}
  >>>

  runtime {
    shortTask: true
    dx_timeout: "5m"
  }
}

task ethnic_sample_lists {
  input {
    String script_dir
    File script = "~{script_dir}/sample_qc/scripts/ethnicity.py"
    File python_array_utils = "~{script_dir}/sample_qc/scripts/python_array_utils.py"

    File white_brits_sample_list # write_sample_list 22006
    File sc_ethnicity_self_report # 21000
  } 

  output {
    # sample_qc/common_filters/ethnicity/{ethnicity}.sample
    Array[String] ethnicities = [
      "black",
      "south_asian",
      "chinese",
      "irish",
      "white_other",
    ]
    # These can be zipped together to form a map if desired
    Array[File] sample_lists = [
      "black.sample",
      "south_asian.sample",
      "chinese.sample",
      "irish.sample",
      "white_other.sample",
    ]
  }

  command <<<
    ~{script} . ~{white_brits_sample_list} ~{sc_ethnicity_self_report}
  >>>

  runtime {
    shortTask: true
    dx_timeout: "5m"
  }
}

task sex_mismatch_sample_list {
  input {
    String script_dir
    File script = "~{script_dir}/sample_qc/scripts/find_sex_mismatch_list.py"

    File sc_genetic_sex #22001
    File sc_reported_sex #31
  }

  output {
    File data = "out.sample"
  }

  command <<<
    ~{script} ~{sc_genetic_sex} ~{sc_reported_sex} out.sample
  >>>

  runtime {
    shortTask: true
    dx_timeout: "5m"
  }
}

task qced_sample_list {
  input {
    String script_dir
    File script = "~{script_dir}/sample_qc/scripts/combine.py"

    File unqced_sample_list # white brits = write_sample_list 22006 or output from ethnic_sample_lists
    File withdrawn_sample_list 
    File sex_mismatch_sample_list # task above
    File sex_aneuploidy_sample_list # write_sample_list 22019
    File low_genotyping_quality_sample_list # write_sample_list 22021 -1

    File? subpop_sample_list # TODO move this to expanse workflow sample_qc/subpops/{subpop}.txt
  }

  String outfname = "qced.samples"

  output {
     File data = outfname # sample_qc/(subpop_)?runs/({subpop}/)?{ethnicity}/no_phenotype/combined.sample
  }

  command <<<
    ~{script} \
      ~{outfname} \
      discard \
      ~{unqced_sample_list} \
      ~{withdrawn_sample_list} \
      ~{sex_mismatch_sample_list} \
      ~{sex_aneuploidy_sample_list} \
      ~{low_genotyping_quality_sample_list} \
      ~{"--subpop " + subpop_sample_list}
  >>>  

  runtime {
    shortTask: true
    dx_timeout: "5m"
  }
}

task load_shared_covars {
  input {
    String script_dir
    File script = "~{script_dir}/traits/load_shared_covars.py"

    File fam_file
    File sc_pcs # 22009
    File sc_assessment_ages
  }

  output {
    # all in traits/shared_covars/
    File shared_covars = "shared_covars.npy" 
    File covar_names = "covar_names.txt"
    File assessment_ages = "assessment_ages.npy"
  }

  command <<<
    ~{script} . ~{fam_file} ~{sc_pcs} ~{sc_assessment_ages}
  >>>

  runtime {
    memory: "10g"

    dx_timeout: "15m"
  }
}

task load_continuous_phenotype {
  input {
    String script_dir
    File script = "~{script_dir}/traits/load_continuous_phenotype_from_main_dataset.py"

    File sc
    File qced_sample_list # from qced_sample_list

    File assessment_ages_npy # from load shared covars
    Array[String] categorical_covariate_names
    Array[File] categorical_covariate_scs
  }

  output {
    File data = "pheno.npy"
    File covar_names = "pheno_covar_names.txt"
    File README = "pheno_README.txt"
  }

  command <<<
    ~{script} \
      ~{sc} \
      '.' \
      ~{qced_sample_list} \
      ~{assessment_ages_npy} \
      --categorical-covariate-names ~{sep=" " categorical_covariate_names} \
      --categorical-covariate-files ~{sep=" " categorical_covariate_scs}
  >>>

  runtime {
    shortTask: true
    dx_timeout: "5m"
  }
}

task load_binary_phenotype {
  input {
    String script_dir
    File script = "~{script_dir}/traits/load_binary_phenotype_from_main_dataset.py"

    File sc
    File qced_sample_list # from qced_sample_list

    File sc_year_of_birth # 34
    File sc_month_of_birth # 52
    File sc_date_of_death # 40000
    String date_of_most_recent_first_occurrence_update
    Boolean is_zero_one_neg_nan = false
  }

  output {
    File data = "pheno.npy"
    File covar_names = "pheno_covar_names.txt"
    File README = "pheno_README.txt"
  }

  command <<<
    ~{script} \
      ~{sc} \
      '.' \
      ~{qced_sample_list} \
      ~{sc_year_of_birth} \
      ~{sc_month_of_birth} \
      ~{sc_date_of_death} \
      ~{date_of_most_recent_first_occurrence_update} \
      ~{if is_zero_one_neg_nan then "--zero-one-neg-nan" else ""}
  >>>

  runtime {
    shortTask: true
    dx_timeout: "5m"
  }
}