bioinform / somaticseq

An ensemble approach to accurately detect somatic mutations using SomaticSeq
http://bioinform.github.io/somaticseq/
BSD 2-Clause "Simplified" License
194 stars 53 forks source link

addsnv.py ERROR/WARNINGS: encountered error in mutation spikein; could not pileup for region #73

Closed gianfilippo closed 4 years ago

gianfilippo commented 5 years ago

Hi, I am modifying your pipeline to get it to work without Docker or Singularity, because of the other issue I reported.

Right now, things seems to be (possibly) ok, until addsnv.py, which generates a very large number of the ERRORS and WARNINGS. Below is an example: Start at 2019/11/06 16:15:23


ERROR 2019-11-06 16:15:28.826929 encountered error in mutation spikein: ['chr1_76895_76895_0.402845353397_None']


WARN 2019-11-06 16:15:28.952514 haplo_chr1_876440_876441 could not pileup for region: chr1:876404


ERROR 2019-11-06 16:15:33.110163 encountered error in mutation spikein: ['chr1_959852_959852_0.105696809142_None']



ERROR 2019-11-06 16:15:33.162897 encountered error in mutation spikein: ['chr1_1031390_1031390_0.251373895866_None']


WARN 2019-11-06 16:15:33.371460 haplo_chr1_1554292_1554293 could not pileup for region: chr1:1554243 WARN 2019-11-06 16:15:34.074624 haplo_chr1_1554292_1554293 could not pileup for region: chr1:1554259

I do get an "unsorted.snvs.added.bam" at the end, but in the next step, makevcf.py, I seem to be missing the file "addsnv_logs_unsorted.snvs.added.bam". I was unable to identify the script that should create it.

What do you think ?

Thanks Gianfilippo

litaifang commented 5 years ago

Is the script still running or stopped? A lot of those warning are due to a mutation not being inserted into the bam files due to various reasons. But if it stopped running without snvs.added.bam (i.e, without unsorted preceding the file), then things have failed.

gianfilippo commented 5 years ago

Hi,

the script did stop, but for a different reason. It looks for the directory addindel_logs_unsorted.snvs.indels.added.bam (initially I thought it was a file) under ${outdir}, but this directory is actually under ${outdir}/logs. You can see this in your bamsurgeon_addsnvs.sh file

What do you think ?

Also, some other output from the error I previously mentioned is below INFO 2019-11-13 23:37:11.339289 chr1_237049497_237049500_0.344190988854:INS:CGC creating tmp bam: addindel.tmp/chr1_237049497_237049500_0.344190988854:INS:CGC.tmpbam.5d412b76-fa4b-4638-bdfc-71675b53e8bd.bam Traceback (most recent call last): File "/home/bin/bamsurgeon/bin/addindel.py", line 116, in makemut mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, mutpos, mutpos+del_ln+1, mutpos_list, avoid=avoid, mutid_list=[mutid], is_insertion=is_insertion, is_deletion=is_deletion, ins_seq=ins, reffile=reffile, indel_start=start, indel_end=end) File "/home/project/.python3/lib/python3.5/site-packages/bamsurgeon-1.0-py3.5.egg/bamsurgeon/mutation.py", line 239, in mutate assert maxfrac is not None, "Error: could not pile up over region: %s" % region AssertionError: Error: could not pile up over region: haplo_chr1_237049497_237049498 INFO 2019-11-13 23:37:11.376720 chr1_238659480_238659486_0.155386519509:INS:GTTTCT creating tmp bam: addindel.tmp/chr1_238659480_238659486_0.155386519509:INS:GTTTCT.tmpbam.c9cf50b4-1b57-4732-9356-0d046a1913d8.bam Traceback (most recent call last): File "/home/bin/bamsurgeon/bin/addindel.py", line 116, in makemut mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, mutpos, mutpos+del_ln+1, mutpos_list, avoid=avoid, mutid_list=[mutid], is_insertion=is_insertion, is_deletion=is_deletion, ins_seq=ins, reffile=reffile, indel_start=start, indel_end=end) File "/home/project/.python3/lib/python3.5/site-packages/bamsurgeon-1.0-py3.5.egg/bamsurgeon/mutation.py", line 239, in mutate assert maxfrac is not None, "Error: could not pile up over region: %s" % region AssertionError: Error: could not pile up over region: haplo_chr1_238659480_238659481

Thanks Gianfilippo

litaifang commented 5 years ago

What was the full command you have used? Make sure all the paths are absolute paths (sym link often cause problems).

gianfilippo commented 5 years ago

Hi, I actually generated the commands with a modified version of the script in your package.  I removed the docker/singularity commands. Also I removed the /mnt in the paths.  The addsnv.py has a --outputbam pointing to ${outdir}, but the unsorted.snvs.added.bam gets generated under ${outdir}/logsThe following step, makevcf.py, is expecting the file under ${outdir}, not under ${outdir},/logs I am a little lost here, since even with my changes, I cannot explain this

Anyway, in general I am modifying your scripts to run without docker or singularity and making it compatible with the SLURM workload manager. I just created a fork and will start working on that, so it will be easier to refer you to a script.

gianfilippo commented 4 years ago

Hi,

the missing/misplaced file/dir issue is related to the running dir. addsnv.py creates the addsnv_logs_unsorted.snvs.added.bam dir in the same dir it is run from.

The various errors and warnings reported above are not even related to this

Thanks