CCBR / Pipeliner

An open-source and scalable solution to NGS analysis powered by the NIH's Biowulf cluster.
4 stars 0 forks source link

Active dev #445

Closed mtandon09 closed 4 years ago

mtandon09 commented 4 years ago

Summary of all changes:

  1. Added module load statement to all rules using perl (unless they already load it))
  2. Updated target bed file input in vardict_tumoronly.rl to match vardict.rl (in my testing, vardict_tumoronly.rl was running even before exome_targets.bed was created, prob cuz it was using the 'ancient' snakemake directive)
  3. Fixed starfusion.rl to load default perl module; it was throwing an error about mismatched Perl binaries, module loading perl after loading star fusion seems to fix it
  4. Fixed "missing FREECFASTA" error for mm10 by changing if statements in "freec_exome_somatic_pass1.rl", "freec_exome_somatic_pass2.rl", and "sequenza.rl"; now will only run Freec/sequenza if genome is "hg19" or "hg38"
  5. Fixed all-exomeseq-somatic.rl, missing comma after "QC/decoy" in input rules for mm10;
  6. Fixed fusioninsp_starfus.rl; fusion inspector exits gracefully without output if there are no fusions in the input file, which causes Snakemake to fail; now a dummy file will be created if no fusions in the input file --- NOTE: the fusion_summary rule does not run; I haven't fixed that yet; but fusion Pipeline "finishes" without error
  7. Bumped up memory for fusion inspector to 96g in cluster.json
  8. Replaced 'reformat_bed.pl' with an equivalent python script 'reformat_bed.py'; FREEC throws an error if two regions in the bed have the same chr/start position, so this script skips these occurrences --- Updated 'make_target_files.rl' to call python script instead of perl script

This repo has been tested for tumor-normal and tumor-only calling with hg19, hg38, and mm10

Two other issues that I was not able to fix yet: -- Sequenza fails if coverage is too low, which causes freec pass2 to fail; possible fix is to output dummy file if sequenza fails, and make ploidy default to 2,3,4,5,6 in freec pass2 config. This is not great, because then some samples may be run differently than others. -- admixture_germline fails often with "Error: could not open temporary file". This is because of the limit of number of files that can be open at once. Setting 'ulimit -f unlimited' does not fix this. But re-running the pipeline when few or no other jobs are running allows it to finish. The only solution that I can find is to use the transposed PED format (--tped) instead, but that would require re-tooling the "admixture_prep.pl" script to handle the new format.