epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

[Bug]: Process `pipeline:reference_assembly:map_reads (1)` terminated with an error exit status (143) #20

Closed rocanja closed 1 year ago

rocanja commented 1 year ago

What happened?

Trying to run the https://github.com/epi2me-labs/wf-transcriptomes workflow with the given example files ERR6053095_chr20.fastq, hg38_chr20.fa and gencode.v22.annotation.chr20.gtf. Workflow terminates on the assembly step after about 1h 15 min runtime with exit status 143 ... which indicates 'out-of-memory'? Is this a memory problem and how can I change the memory allocation and should the default allocation not at least be able to run the example data smoothly? Can you please provide a ballpark on how many cpus and mem should be allocated to run a real life human transcriptome sample successfully?

Operating System

ubuntu 20.04

Workflow Execution

EPI2ME

Workflow Execution - EPI2ME Labs Versions

No response

Workflow Execution - CLI Execution Profile

Singularity

Workflow Version

epi2me-labs/wf-transcriptomes [curious_jones] - revision: 85e85409ef [master]

Relevant log output

task_id hash    native_id   name    status  exit    submit  duration    realtime    %cpu    peak_rss    peak_vmem   rchar   wchar
3   72/434de8   4248579.pbs pipeline:getParams  COMPLETED   0   2023-05-17 10:19:33.018 2m 34s  108ms   26.8%   0   0   470.2 KB    2.6 KB
2   c7/3f8e6d   4248580.pbs fastcat (1) COMPLETED   0   2023-05-17 10:19:33.159 2m 39s  1.4s    194.8%  10.2 MB 352 MB  162.8 MB    124.6 MB
4   38/05e1de   4248581.pbs pipeline:getVersions    COMPLETED   0   2023-05-17 10:19:33.281 2m 39s  12.9s   80.5%   83.6 MB 141.3 MB    129.6 MB    3 KB
1   17/98d724   4248582.pbs pipeline:build_minimap_index    COMPLETED   0   2023-05-17 10:19:33.409 3m 44s  1m 38s  11.7%   253.1 MB    891.7 MB    64.4 MB 187.8 MB
5   02/d7f302   4248584.pbs pipeline:collectFastqIngressResultsInDir (1)    COMPLETED   0   2023-05-17 10:22:12.439 3m 25s  247ms   16.8%   2.8 MB  3.8 MB  711.6 KB    304 B
7   df/5d62c2   4248588.pbs output (1)  COMPLETED   0   2023-05-17 10:25:37.439 2m 40s  15ms    46.5%   0   0   460.5 KB    220 B
6   16/96da87   4248585.pbs pipeline:preprocess_reads (1)   COMPLETED   0   2023-05-17 10:22:12.576 6m 35s  2m 58s  222.3%  856.3 MB    4.7 GB  1.1 GB  980.3 MB
8   ea/673684   4249824.pbs pipeline:reference_assembly:map_reads (1)   FAILED  143 2023-05-17 10:28:47.542 1h 4m 12s   1h 22s  -   -   -   -   -
sarahjeeeze commented 1 year ago

Hi, are you running it with docker? If so its worth checking any resource limits set see this - https://labs.epi2me.io/installation/#resource-limits. For the sample dataset it shouldn't require more than 8gb. If you are using a whole human ref then potentially 10gb for most of the workflow, from my experience the max it requires is 15gb for the minimap index step dependent on the reference file. You could also set a slightly higher window size eg. --minimap_index_opts -w15 which will reduce memory requirement for the indexing.

rocanja commented 1 year ago

Thanks so much for your reply. I have managed to run the example data and as you have indicated, it didn't actually need more than 8 GB. I got some help from our university's IT support team to solve the problem. Apparently the workflow doesn’t specify ram and cpu requirements for the processes, and thus, when it submits jobs to our HPC, they default to 1 cpu and 1 gb, which is obviously not enough. And even when I specified more cpus and mem when submitting my job, it didn't actually utilise what I had given and would still fall back to the default 1 cpu and 1 gb. The solution was to define executor = local for the processes in the config file so the workflow would actually use the resources given with the job. My next step is to try some real life samples and thus, it's good to know the ballpark compute requirements. Thanks for your help!