d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Updated snakemake workflow for copy_number_consensus_call module #557

Closed ewafula closed 1 year ago

ewafula commented 1 year ago

What analysis module should be updated and why?

copy_number_consensus_call

What changes need to be made? Please provide enough detail for another participant to make the update.

What input data should be used? Which data were used in the version being updated?

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

@ewafula

jharenza commented 1 year ago

@migbro @yuankunzhu is anyone on your team familiar with snakemake and can help with this, or better yet - port this consensus CNV workflow to CAVATICA for future releases?

migbro commented 1 year ago

So, I think it's related to this ticket here: https://github.com/d3b-center/bixu-tracker/issues/1862 which I assigned to Alex and Saksham. I had intended to make this a priority for use in cBio, but I think @yuankunzhu wanted to do some benchmarking/understand the method first before committing. I can ask him about it in our meeting today

jharenza commented 1 year ago

OK, thanks. Even if you'd like to do more development, if we can get this in CAVATICA for OPC purposes or have someone work on the snakemake fix for v13, that would be appreciated. We can discuss at the 3 pm meeting. Thanks!

migbro commented 1 year ago

In thinking about it a bit, I think either @sickler-alex or @sakshamphul can help out some time this week. I think they were assessing making it an app so they can either:

  1. Try to finish the job
  2. Do this fix first, then maybe port over later Some initial questions are:
  3. What is to be done? Just a bug fix to ensure neutral calls are captured?
  4. I see listed as input two of our three CNV callers - is that expected?
jharenza commented 1 year ago

Yes @ewafula can you document exactly the error and your temp fix? It's a bug fix that occurred in v12, but not v11. Though I'm not sure who ran v11 and the cnvs from v11 may have been questionable so unclear when this started.

Should be 3 caller input but consensus 2/3.

ewafula commented 1 year ago

@migbro @jharenza,

The copy_number_consensus_call snakemake workflow has been failing with v12 data in the OPC docker container when it gets to the rules of creating the combined collection sample of neutral calls. Below I have attached an example snakemake log file that includes the error message generated when the snakemake workflow fails.

My workaround has been to manually merge neutral calls for all samples (with bash script) and then restart the workflow from where it failed with neutral results (results/cnv_neutral.bed) now available for it to continue (see command below):

snakemake -s Snakefile --configfile ../../scratch/copy_consensus/config_snakemake.yaml -j --restart-times 2 --keep-going 

2023-02-16T115121.523377.snakemake.log.zip

sickler-alex commented 1 year ago

@ewafula can you share the bash script you were using to merge the neutral calls or was it the same commands as the step you linked in your comment?

sickler-alex commented 1 year ago

In the meeting we mentioned briefly that at some point only CNVs called by all 3 callers were included in the final output. Is this still occurring? And, is that expected or does the pipeline need to go back to calls needing to be in only 2/3 of the callers?

ewafula commented 1 year ago

@ewafula can you share the bash script you were using to merge the neutral calls or was it the same commands as the step you linked in your comment?

@sickler-alex, same Unix cat command that fails in the snakemake workflow shown at end of the log file that I attached to merge all sample neutral calls into bed a single bed file.

sickler-alex commented 1 year ago

@ewafula thanks. When you run the cat command, does it give you any warnings? I haven't found anything specific yet in the log file you sent.

ewafula commented 1 year ago

@sickler-alex, the cat command outside of the module of the snakemake workflow is the workaround to create the cnv_neutral.bed required to complete the workflow execution. If you do tail on the log file, you see the error messages generated by the snakemake workflow:

$ tail 2023-02-16T115121.523377.snakemake.log
../../scratch/copy_consensus/endpoints/BS_981A5MQ1.neutral.bed ../../scratch/copy_consensus/endpoints/BS_Q9NHTR7G.neutral.bed ../../scratch/copy_consensus/endpoints/BS_AW1G0X6Y.neutral.bed ../../scratch/copy_consensus/endpoints/BS_VW1TW6VV.neutral.bed ../../scratch/copy_consensus/endpoints/BS_7HD3TZNC.neutral.bed ../../scratch/copy_consensus/endpoints/BS_C1RRDNF5.neutral.bed ../../scratch/copy_consensus/endpoints/BS_H68W3CE8.neutral.bed ../../scratch/copy_consensus/endpoints/BS_JTNTWJMD.neutral.bed ../../scratch/copy_consensus/endpoints/BS_QH30P0AR.neutral.bed ../../scratch/copy_consensus/endpoints/BS_9CMJ3QF6.neutral.bed ../../scratch/copy_consensus/endpoints/BS_1SV5KG4Y.neutral.bed ../../scratch/copy_consensus/endpoints/BS_823V5X6Z.neutral.bed ../../scratch/copy_consensus/endpoints/BS_R407FX2P.neutral.bed ../../scratch/copy_consensus/endpoints/BS_PZEB1PFD.neutral.bed ../../scratch/copy_consensus/endpoints/BS_AH3RVK53.neutral.bed ../../scratch/copy_consensus/endpoints/BS_FVPMPMRJ.neutral.bed ../../scratch/copy_consensus/endpoints/BS_V3SC6NWH.neutral.bed ../../scratch/copy_consensus/endpoints/BS_GPPREVAD.neutral.bed ../../scratch/copy_consensus/endpoints/BS_5RD8KEED.neutral.bed ../../scratch/copy_consensus/endpoints/BS_1FW0H2V1.neutral.bed ../../scratch/copy_consensus/endpoints/BS_97M1BPNE.neutral.bed ../../scratch/copy_consensus/endpoints/BS_BJCTR4C8.neutral.bed ../../scratch/copy_consensus/endpoints/BS_SY6CCX4G.neutral.bed ../../scratch/copy_consensus/endpoints/BS_6GPWAQ5R.neutral.bed ../../scratch/copy_consensus/endpoints/BS_42A66SH4.neutral.bed ../../scratch/copy_consensus/endpoints/BS_2NEBNR18.neutral.bed ../../scratch/copy_consensus/endpoints/BS_233JPDBD.neutral.bed ../../scratch/copy_consensus/endpoints/BS_93B38HPS.neutral.bed ../../scratch/copy_consensus/endpoints/BS_N3PSD3AD.neutral.bed > results/cnv_neutral.bed
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Thu Feb 16 16:25:19 2023]
Finished job 4.
66907 of 66911 steps (100%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /home/rstudio/OpenPedCan-analysis/analyses/copy_number_consensus_call/.snakemake/log/2023-02-16T115121.523377.snakemake.log
sickler-alex commented 1 year ago

@ewafula I've been trying to run the pipeline locally and using the OPC docker and I'm getting a different error and when I try to fix it like snakemake recommends, I get a different error. Is this something that's happening to you too?

Building DAG of jobs...
IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
../../scratch/copy_consensus/interim/BS_GCQEB05M.gatk.dup.filtered3.bed
../../scratch/copy_consensus/interim/BS_DH86PF5A.gatk.del.filtered3.bed
(snakemake) ➜ sicklera in ~/repos/OpenPedCan-analysis/analyses/copy_number_consensus_call on dev ● ● λ snakemake --cleanup-metadata ../../scratch/copy_consensus/interim/BS_GCQEB05M.gatk.dup.filtered3.bed
KeyError in line 16 of /Users/sicklera/repos/OpenPedCan-analysis/analyses/copy_number_consensus_call/Snakefile:
'scratch'
  File "/Users/sicklera/repos/OpenPedCan-analysis/analyses/copy_number_consensus_call/Snakefile", line 16, in <module>
ewafula commented 1 year ago

Yes, @sickler-alex. That also happened to me during the v12 data release process. That's why I resorted to running the commands manually outside the snakemake workflow to move forward with the data release. I was planning to spend more time working to either revamp the snakemake workflow or rewrite with snakemake. But priorities have changed with the end of the project. @jharenza thought your group might have some experience with developing in snakemake. @migbro also proposed converting to CWL and running the module on CAVATICA.

migbro commented 1 year ago

I don 't really have much experience with snakemake, and I feel like Komal definitely does. Does the person that authored this still work with us? Can we loop them in for advice? Converting to a cwl workflow, I am uncertain the effort level, especially if you're intent is to do this for v13. Not sure if @sickler-alex has an idea, but I would not have the timne to evaluate that until possibly later this week or the next.

jharenza commented 1 year ago

No, Nhat left for school a few years ago. Komal won't be able to give effort.

jharenza commented 1 year ago

Yes, @sickler-alex. That also happened to me during the v12 data release process. That's why I resorted to running the commands manually outside the snakemake workflow to move forward with the data release. I was planning to spend more time working to either revamp the snakemake workflow or rewrite with snakemake. But priorities have changed with the end of the project. @jharenza thought your group might have some experience with developing in snakemake. @migbro also proposed converting to CWL and running the module on CAVATICA.

@ewafula can you document/create a bash script to run the commands outside of snakemake?

jharenza commented 1 year ago

Alternatively - @jashapiro - I don't know if you have any free time, but we are having some issues with the snakemake workflow when running consensus CNV - if any of these errors jump out to you as "oh this is what the issue is", could you give a comment here? Thank you!

jashapiro commented 1 year ago

I kind of suspect that the cat command is failing because of too many arguments, as I have definitely seen that. My suggestion for better debugging would be to add a log: directive to that rule https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files to capture stderr.

If that is the case, then rewriting the command with a for loop and progressive addition to the combined file (as merge_all does) might work.

This snakemake file is pretty complex in ways that I suspect will defy easy (and automatic) conversion to CWL... If I were starting again (or had written it myself) I would do things differently!

ewafula commented 1 year ago

Yes, @sickler-alex. That also happened to me during the v12 data release process. That's why I resorted to running the commands manually outside the snakemake workflow to move forward with the data release. I was planning to spend more time working to either revamp the snakemake workflow or rewrite with snakemake. But priorities have changed with the end of the project. @jharenza thought your group might have some experience with developing in snakemake. @migbro also proposed converting to CWL and running the module on CAVATICA.

@ewafula can you document/create a bash script to run the commands outside of snakemake? @jharenza, there is no standard bash script to use. But whoever runs the module needs to look through the snakemake log file to find failed commands. Those can then be run manually via bash script or command line. For v12, the sample neutral calls were not being complied with the snakemake workflow. I compiled all sample neutral calls outside of snakemake (as I illustrated above) then restarted the workflow from where it had prematurely stopped. The workaround fix might be different for a subsequent dataset. Initially, I thought it might be a docker issue, but I wasn't successful running the module outside of docker either. Attached is the command I executed the create to merge all the neutral calls for all samples from scratch in results/cnv_neutral.bed, which I don't think helps in fixing the snakemake workflow.

merge_neutrals.sh.zip

migbro commented 1 year ago

I feel like @jashapiro might be on to something if running this requires giving a ton of files as args. Perhaps try with a subset to see if the error persists to confirm? Also in that subset, include the samples that it names as "incomplete" to see if you get the error again with fewer args or not.

sickler-alex commented 1 year ago

The error I'm getting is much earlier in the process then the neutral merge. Something is going on in the gatk_filter. When the snakemake runs, those samples don't have any output from the gatk_filter. When I run the commands of that step manually or when I run it on only 3 samples (the 2 from the error and a random good one), it runs successfully through the whole process. I'm still trying to figure out why it happened.

ewafula commented 1 year ago

I kind of suspect that the cat command is failing because of too many arguments, as I have definitely seen that. My suggestion for better debugging would be to add a log: directive to that rule https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files to capture stderr.

If that is the case, then rewriting the command with a for loop and progressive addition to the combined file (as merge_all does) might work.

This snakemake file is pretty complex in ways that I suspect will defy easy (and automatic) conversion to CWL... If I were starting again (or had written it myself) I would do things differently!

@jashapiro, the explanation makes sense to me now that I am thinking of it. The module runs ok for v11 data and not v12. The only difference is the increased number of samples to cat together for the latter. Unfortunately, the snakemake error is not informative. Would have expected to see the Too many arguments error that is usually thrown by the cat command when there are too many files for the cat command to handle.

jashapiro commented 1 year ago

@jashapiro, the explanation makes sense to me now that I am thinking of it. The module runs ok for v11 data and not v12. The only difference is the increased number of samples to cat together for the latter. Unfortunately, the snakemake error is not informative. Would have expected to see the Too many arguments error that is usually thrown by the cat command when there are too many files for the cat command to handle.

Yes, unfortunately snakemake is not good at passing along errors. A common pattern that I have added is to do something like the following to capture error output:

rule merge_neutral:
    input:
        expand(scratch_loc + "endpoints/{sample}.neutral.bed",
               sample = config["samples"])
    output:
        "results/cnv_neutral.bed"
    log:
       "logs/merge_neutral.log"
    shell:
        "cat {input} > {output} 2> {log}"
sickler-alex commented 1 year ago

In testing this, I added logging to the gatk_filter step and reran the full dataset hoping to find what's actually going wrong and it completed successfully including the merge_neutral step. I'm not sure why however. The only other thing I did was delete my .snakemake file while I was trying to run a smaller number of samples.

sickler-alex commented 1 year ago

@ewafula can you try deleting your .snakemake folder, scratch/copy_consensus, and rerunning the pipeline? That seemed to have fixed it for me. In the OPC meeting Monday (06/05) we discussed that an endpoint for this ticket could be just fixing the error(s) so we can get v13 out.

sickler-alex commented 1 year ago

In the meeting we mentioned briefly that at some point only CNVs called by all 3 callers were included in the final output. Is this still occurring? And, is that expected or does the pipeline need to go back to calls needing to be in only 2/3 of the callers?

@jharenza @ewafula just want to make sure that this question doesn't get lost in the rest of the discussion so it can be properly resolved when the conversion to cwl is made.

ewafula commented 1 year ago

@ewafula can you try deleting your .snakemake folder, scratch/copy_consensus, and rerunning the pipeline? That seemed to have fixed it for me. In the OPC meeting Monday (06/05) we discussed that an endpoint for this ticket could be just fixing the error(s) so we can get v13 out.

@sickler-alex,I tried that and got the same error message in previous v12 docker instance (Debian stretch). Note that the pipeline works ok with the v11 data. So if testing with a smaller dataset you won't be able to reproduce the error. Need to try rerunning with entire v12 dataset which much is much larger to check if cat commands for neutral calls works. @jashapiro, suggested the cat command might be able to merge a very large number of files on some machines. In sure if this is associated with ulimit settings. The optimal solution to avoid encountering this particular error and make module code portable is to implement a loop when merging sample calls as @jashapiro suggested. I'll trying rerunning on the new OPC docker instance now updated to Debian bookworm. Even if it works on new OPC docker instance, it is not a long term solutions if sample number are going to get larger in the future.

sickler-alex commented 1 year ago

@ewafula my last successful rerun was with the current v12 data.

ewafula commented 1 year ago

@sickler-alex, that is encouraging. I am rerunning the modules now in the new OPC docker instance with the v12 data. Hopefully, it can complete successively, which would point to Debian stretch in the previous docker image having been the problem. Likely to do with the ulimit settings. I'll get back to you when the run completes.

jharenza commented 1 year ago

🤞

ewafula commented 1 year ago

@sickler-alex, @jharenza , my module run has successfully completed on the new OPC docker image built on Debian bookworm. This validates issue was related to the previous OPC docker image.

[Wed Jun  7 18:41:43 2023]
Finished job 69531.
69529 of 69530 steps (100.0%) done
Select jobs to execute...

[Wed Jun  7 18:41:43 2023]
localrule all:
    input: results/cnv_consensus.tsv, results/cnv_neutral.bed, results/cnv-consensus.seg.gz
    jobid: 0
    reason: Input files updated by another job: results/cnv_neutral.bed, results/cnv_consensus.tsv, results/cnv-consensus.seg.gz
    resources: tmpdir=/tmp

[Wed Jun  7 18:41:43 2023]
Finished job 0.
69530 of 69530 steps (100%) done
Complete log: .snakemake/log/2023-06-07T175631.281470.snakemake.log
root@891246a72117:/home/rstudio/OpenPedCan-analysis/analyses/copy_number_consensus_call#