cumc / xqtl-protocol

Molecular QTL analysis protocol developed by ADSP Functional Genomics Consortium
https://cumc.github.io/xqtl-protocol/
MIT License
38 stars 42 forks source link

Task parameter not recognized in BiCV_factor #173

Closed hsun3163 closed 2 years ago

hsun3163 commented 2 years ago

While running the BiCV_factor module, following error occurs:

INFO: Running fake_vcf:
ERROR: [fake_vcf (fake_vcf)]: [fake_vcf]: Failed to execute process
"R(fr"""library("dplyr")\nlibrary("readr")\n## Add fake header\n...output[0]}.stdout')\n"
name 'walltime' is not defined

When replacing the walltime with a string as "24h", following error occurs:

INFO: Running fake_vcf:
ERROR: [fake_vcf (fake_vcf)]: [fake_vcf]: Failed to execute process
"R(fr"""library("dplyr")\nlibrary("readr")\n## Add fake header\n...output[0]}.stdout')\n"
name 'mem' is not defined

However, the task line is completely copy-pasted and the parameter mem/ walltime definitly defined, as followed

task: trunk_workers = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
[global]
# The output directory for generated files. MUST BE FULL PATH
parameter: cwd = path
# The molecular phenotype matrix
parameter: phenoFile = path
# The covariate file
parameter: covFile = path("./")
# For cluster jobs, number commands to run per job
parameter: job_size = 1
# Wall clock time expected
parameter: walltime = "5h"
# Memory expected
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 8
# Software container option
parameter: container = ""
parameter: name = ""
# N PEER factors
parameter: N = 30

# Default values from PEER:
## The number of max iteration

parameter: iteration = 10

This module can be ran the day before yesterday. And other module, namely PEER, can be ran with the exact code today.

The module will work if removing the -J -q -c option (not involving the task statement).

hsun3163 commented 2 years ago

Progress: It is highly likely that the issue is caused by output_from("") statement. to be Verified.

hsun3163 commented 2 years ago

Verified the following behavior:

invoking other step by output_from , named_output , and provides will cause the task statement not being able to recognize the global statement. What I am confuse is that why it works previously.

[global]
# The output directory for generated files. MUST BE FULL PATH
parameter: cwd = path("./")
# For cluster jobs, number commands to run per job
parameter: job_size = 1
# Wall clock time expected
parameter: walltime = "5h"
# Memory expected
parameter: mem = "16G"
# Number of threads
parameter: numThreads = 8

[A]
output: file = f'{cwd:a}/test_file'
task: trunk_workers = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand= "$[ ]", stderr = f'{_output}.stderr', stdout = f'{_output}.stdout'
    touch $[_output]
[B]
input: output_from("A")
task: trunk_workers = 1, walltime = walltime, mem = mem, cores = numThreads, tags = f'{step_name}_{_output:bn}'
bash: expand= "$[ ]", stderr = f'{_output}.stderr', stdout = f'{_output}.stdout'
    touch $[_input]
hsun3163 commented 2 years ago

For this particular probelm, I think what make sense is to remove the task statement in fake_vcf step, as it will by no means require a lot of resouce. For future usage, this should be addressed in https://github.com/vatlab/sos/issues/1457

For the time being, if such invocation is needed. The problem can also be patched by including local parameter of walltimes .etc