adamewing / bamsurgeon

tools for adding mutations to existing .bam files, used for testing mutation callers
MIT License
232 stars 86 forks source link

ERROR: encountered error in mutation spikein #52

Closed aistrand closed 8 years ago

aistrand commented 8 years ago

bamsurgeon stopped running and I got the following error message:

ERROR 2016-04-13 00:48:07.825074 encountered error in mutation spikein: ['5_104998603_104998603_0.444444444444_None'] Traceback (most recent call last): File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 156, in makemut mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile) File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 153, in mutate if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048): AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'


INFO 2016-04-13 00:48:07.839295 haplo_5_104999315_104999315 creating tmp bam: addsnv.tmp/haplo_5_104999315_104999315.tmpbam.2879ceab-8fe0-498e-8221-7c2bed3bb6a9.bam


ERROR 2016-04-13 00:48:07.857766 encountered error in mutation spikein: ['5_104999315_104999315_0.588235294118_None'] Traceback (most recent call last): File "/ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py", line 156, in makemut mutfail, hasSNP, maxfrac, outreads, mutreads, mutmates = mutation.mutate(args, log, bamfile, bammate, chrom, min(mutpos_list), max(mutpos_list)+1, mutpos_list, avoid=avoid, mutid_list=mutid_list, is_snv=True, mutbase_list=mutbase_list, reffile=reffile) File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 153, in mutate if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048): AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'


INFO 2016-04-13 00:48:07.874683 no succesful mutations

These are only the last few lines in the log file, there are more errors if you keep going back.

These are the inputs: INFO 2016-04-13 00:29:11.330065 starting /ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py called with args: /ifs/home/c2b2/mp_lab/ams2432/bamsurgeon-master/bin/addsnv.py -v /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/bed_input/sites.non_cpgs.84.02.trio3.chr5.20000.txt -f /ifs/scratch/c2b2/mp_lab/ams2432/hutt/best_practices/recal_bams_split/recal_Hutt3-chr5_5mb_region.bam -r /ifs/data/c2b2/mp_lab/ams2432/files_needed/human_ref/hs37d5.fa -o /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/spiked_bams/spikedin_20000_Hutt3-chr5_5mb_region.bam --picardjar /ifs/home/c2b2/mp_lab/ams2432/picard-tools-1.141/ --mindepth 11 --maxdepth 48 --minmutreads 3 --tagreads --aligner mem

Not sure what is going here; would you know how to fix this? Is there a problem with my input file?

aistrand commented 8 years ago

The program produced some other files; could you explain what those are?

Finally, I didn't add any options for the aligner (--alignopts), but would like to add -t and -R option values. I got an error when I tried to do --alignopts -t:8 (for example); what is the correct format for writing out the options? Also, I assume if I want to use mem as the aligner, I just write --aligner mem, correct?

aistrand commented 8 years ago

By looking around other reported errors/bugs, I found out that the problem is most likely that my version of Pysam is too old (I have 0.6). I tried to install the newest version using pip install -U pysam, but got the following error message: Command /ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/bin/python -c "import setuptools, tokenize;file='/tmp/pip_build_ams2432/pysam/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-rGTnRx-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_ams2432/pysam Traceback (most recent call last): File "/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/bin/pip", line 6, in sys.exit(main()) File "/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/lib/python2.7/site-packages/pip/init.py", line 198, in main return command.main(cmd_args) File "/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/lib/python2.7/site-packages/pip/basecommand.py", line 161, in main text = '\n'.join(complete_log) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 40: ordinal not in range(128)

I'm not sure, but this might have to do with not having full permissions for /ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/. Is there a way around it?

Thanks, Alva

adamewing commented 8 years ago

The query_position error is due to changes in the pysam API so you're correct that you need to update. I think that might fix some of these other issues, from the look of it. The easiest thing to do might be getting your sysadmin to update pysam for you with pip, let me know if that's an option.

aistrand commented 8 years ago

When I do pip install -U pysam, where does pysam actually get installed?

aistrand commented 8 years ago

I sent the sysadmin a note but they tend to be slow. Is there a quick way to get this up and running?

adamewing commented 8 years ago

Try pip install --user aistrand -U pysam (where 'aistrand' should be replaced with your actual username)

aistrand commented 8 years ago

I tried that and it didn't work:

Downloading/unpacking ams2432 Could not find any downloads that satisfy the requirement ams2432 Cleaning up... No distributions at all found for ams2432 Storing debug log for failure in /ifs/home/c2b2/mp_lab/ams2432/.pip/pip.log

adamewing commented 8 years ago

Oh, sorry I think it doesn't need your username, just:

pip install --U --user pysam
aistrand commented 8 years ago

Unfortunately it gave me the same error message as before...

Command /ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/bin/python -c "import setuptools, tokenize;file='/tmp/pip_build_ams2432/pysam/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /tmp/pip-RhPKDh-record/install-record.txt --single-version-externally-managed --compile --user failed with error code 1 in /tmp/pip_build_ams2432/pysam Traceback (most recent call last): File "/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/bin/pip", line 6, in sys.exit(main()) File "/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/lib/python2.7/site-packages/pip/init.py", line 198, in main return command.main(cmd_args) File "/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/lib/python2.7/site-packages/pip/basecommand.py", line 161, in main text = '\n'.join(complete_log) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 40: ordinal not in range(128)

adamewing commented 8 years ago

hm, try:

git clone https://github.com/pysam-developers/pysam
cd pysam
python setup.py build
python setup.py install --user
aistrand commented 8 years ago

Unfortunately that didn't work, it gave me another error:

pysam/chtslib.pxd:108:4: 'off_t' is not a type identifier building 'pysam.libchtslib' extension creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/pysam creating build/temp.linux-x86_64-2.7/htslib creating build/temp.linux-x86_64-2.7/htslib/cram gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipysam -I. -Ihtslib -I/ifs/data/c2b2/gs_lab/shared/software_hpc/anaconda/include/python2.7 -c pysam/chtslib.c -o build/temp.linux-x86_64-2.7/pysam/chtslib.o -Wno-unused -Wno-strict-prototypes -Wno-sign-compare -Wno-error=declaration-after-statement pysam/chtslib.c:1:2: error: #error Do not use this file, it is the result of a failed Cython compilation. error: command 'gcc' failed with exit status 1

I updated cython just to be sure (pip install cython worked).

I will see if I can fix this issue tomorrow with one of the IT people, but if you have any suggestions please let me know. How long does bamsurgeon take to run? I am working with truncated BAM files encompassing 5 mb regions.

adamewing commented 8 years ago

If it's urgent I would maybe look into virtualenv or docker. BS should run pretty quick on 5mb chunks, running time is a function of the number of mutations and the number of cores you can throw at it.

aistrand commented 8 years ago

What would I use these tools for exactly?

aistrand commented 8 years ago

Hi Adam, I was able to upgrade to the newest version of pysam (I now have version 0.9.0), but I am still getting the same error: AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'

Not sure why this is happening... Is there anything I can do to test the program further? Do you know if this has been a recurring problem?

adamewing commented 8 years ago

Could you try this:

python -c 'import pysam; print pysam.__version__'

and let me know if it says 0.9.0?

The 'query_position' error is definitely due to an API change between I think pysam 0.6.x (maybe 0.7) and later versions so my guess is that you've upgraded pysam but the specific instance of the python interpreter isn't aware of the module you installed. You might need to add something to your $PYTHON_PATH

aistrand commented 8 years ago

Yes, it says 0.9.0. What should I be adding to the path?

adamewing commented 8 years ago

Hmm you probably don't need to add anything to the path then... does the following also return 0.9.0?

/usr/bin/env python -c 'import pysam; print pysam.__version__'
aistrand commented 8 years ago

Yes.

adamewing commented 8 years ago

Maybe bamsurgeon needs to be re-installed, try:

python setup.py install

From within your bamsurgeon repo directory

aistrand commented 8 years ago

Just to confirm, that is the only command needed to install bamsurgeon, once you have it in your desired directory and do cd bamsurgeon-master, correct?

I think there might have been some problems with the installation, trying to resolve that now.

aistrand commented 8 years ago

I fixed the installation of bamsurgeon, but am still running into the same problem. Going to check whether all the dependencies work as they should, but I don't understand where the problem is coming from...

aistrand commented 8 years ago

To set the aligner to bwamem, we do -aligner mem, correct? Then I can just put bwa on my PATH? (/ifs/home/c2b2/mp_lab/ams2432/bwa-0.7.9a/) Maybe some of my options that I am using are incorrect?

Again, this is the error I am getting:

ERROR 2016-04-22 16:56:32.024224 encountered error in mutation spikein: ['5_104999315_104999315_0.588235294118_None'] Traceback (most recent call last): File "/ifs/data/c2b2/gs_lab/shared/software/anaconda/lib/python2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 156, in makemut File "build/bdist.linux-x86_64/egg/bamsurgeon/mutation.py", line 153, in mutate if pread.query_position is not None and not pread.alignment.is_secondary and bin(pread.alignment.flag & 2048) != bin(2048): AttributeError: 'csamtools.PileupRead' object has no attribute 'query_position'

with the following command line:

/ifs/data/c2b2/gs_lab/shared/software/anaconda/bin/python /ifs/data/c2b2/gs_lab/shared/software/anaconda/bin/addsnv.py -v /ifs/scratch/c2b2/mp_lab/ams2432/hutt/dnm_files/matching/bamsurgeon/bed_input/sites.non_cpgs.${famid}.trio${trionum}.chr${chnum}.${nb_sims}.txt -f /ifs/scratch/c2b2/mp_lab/ams2432/hutt/best_practices/recal_bams_split/recal_Hutt${trionum}-chr${chnum}_5mb_region.bam -r /ifs/data/c2b2/mp_lab/ams2432/files_needed/human_ref/hs37d5.fa -o ${path}test_spikedin_${nb_sims}_Hutt${trionum}-chr${chnum}_5mb_region.bam --picardjar /ifs/home/c2b2/mp_lab/ams2432/picard-tools-1.141/ --mindepth 11 --minmutreads 3 --tagreads --aligner mem

aistrand commented 8 years ago

On a slightly more encouraging note, I managed to get a bit further with a different version of python/installation of bamsurgeon, but got the following error:

Error: Invalid or corrupt jarfile /ifs/home/c2b2/mp_lab/ams2432/picard-tools-1.141/

after these output lines:

INFO 2016-04-22 17:47:58.361082 haplo_1_105014044_105014044 creating tmp bam: addsnv.tmp/haplo_1_105014044_105014044.tmpbam.01549a30-503d-4123-bd85-1de2604ddfaa.bam WARN 2016-04-22 17:48:00.636590 1_105014044_105014044_0.25_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2208:9831:65039 INFO 2016-04-22 17:48:03.112085 haplo_1_105014044_105014044 len(readlist): 28 INFO 2016-04-22 17:48:03.112437 haplo_1_105014044_105014044 selected VAF: 0.25 INFO 2016-04-22 17:48:03.112575 haplo_1_105014044_105014044 picked: 7 INFO 2016-04-22 17:48:03.113094 haplo_1_105014044_105014044 wrote: 28 mutated: 7 INFO 2016-04-22 17:48:03.119591 haplo_1_105014044_105014044 converting addsnv.tmp/haplo_1_105014044_105014044.tmpbam.01549a30-503d-4123-bd85-1de2604ddfaa.bam to fastq INFO 2016-04-22 17:48:03.119679 converting BAM addsnv.tmp/haplo_1_105014044_105014044.tmpbam.01549a30-503d-4123-bd85-1de2604ddfaa.bam to FASTQ

Is there a particular version of picardtools that is supported/not supported by bamsurgeon? I've used this picard.jar file before without problems.

I also noticed this message:

WARN 2016-04-22 17:48:00.636590 1_105014044_105014044_0.25_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2208:9831:65039

Why might this be occurring?

adamewing commented 8 years ago

--picardjar needs to point at the picard .jar file, not just to the directory that contains it

The warnings about mate reads are due to the input BAM containing reads marked as paired but one of the mates is not in the file - most commonly that happens with 'sliced' BAM files (i.e. one created with samtools view from a larger BAM). Usually it isn't a problem unless you're spiking in close to the edge of the slice.

aistrand commented 8 years ago

Hi Adam,

Thanks for the tip, I managed to fix that issue, but then got a new one:

[E::bwa_idx_load] fail to locate the index files

after getting the following:

INFO 2016-04-24 19:14:40.571660 converting BAM addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam to FASTQ

[Sun Apr 24 19:14:42 EDT 2016] picard.sam.SamToFastq INPUT=addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam FASTQ=addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json

[Sun Apr 24 19:14:42 EDT 2016] Executing as ams2432@ha4c3n7.hpc on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13; Picard version: 1.141(8ece590411350163e7689e9e77aab8efcb622170_1447695087) IntelDeflater

[Sun Apr 24 19:14:42 EDT 2016] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=503840768

INFO 2016-04-24 19:14:42.741020 haplo_1_105000168_105000168 aligning addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.fastq with bwa mem

I am wondering which index files we are talking about here, and if the error message has anything to do with the fact that I have CREATE_INDEX=false as one of the options that is outputted and specified above... I apologize for bothering you with this program but want to get it running successfully!

And you are right, I am using a truncated BAM file. I could try using a non-truncated one for verification, but it seems that the error message doesn't stem from that issue and that the "no mate" message is not problematic.

The full output is as follows:

INFO 2016-04-24 19:14:39.082199 haplo_1_105000168_105000168 creating tmp bam: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam WARN 2016-04-24 19:14:39.932803 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:1201:4661:35955 WARN 2016-04-24 19:14:39.933621 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:287:C2JDBACXX:3:2309:4805:61010 WARN 2016-04-24 19:14:39.934782 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2308:12041:43202 WARN 2016-04-24 19:14:39.939805 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:1110:2582:82263 WARN 2016-04-24 19:14:39.940179 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:1307:7659:77106 WARN 2016-04-24 19:14:39.941673 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2310:8936:54737 WARN 2016-04-24 19:14:39.942045 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2302:17262:55131 INFO 2016-04-24 19:14:40.565903 haplo_1_105000168_105000168 len(readlist): 21 INFO 2016-04-24 19:14:40.566091 haplo_1_105000168_105000168 selected VAF: 0.565217391304

INFO 2016-04-24 19:14:40.566186 haplo_1_105000168_105000168 picked: 11 INFO 2016-04-24 19:14:40.566794 haplo_1_105000168_105000168 wrote: 21 mutated: 11 INFO 2016-04-24 19:14:40.571381 haplo_1_105000168_105000168 converting addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam to fastq

INFO 2016-04-24 19:14:40.571660 converting BAM addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam to FASTQ [Sun Apr 24 19:14:42 EDT 2016] picard.sam.SamToFastq INPUT=addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam FASTQ=addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Sun Apr 24 19:14:42 EDT 2016] Executing as ams2432@ha4c3n7.hpc on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13; Picard version: 1.141(8ece590411350163e7689e9e77aab8efcb622170_1447695087) IntelDeflater [Sun Apr 24 19:14:42 EDT 2016] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=503840768 INFO 2016-04-24 19:14:42.741020 haplo_1_105000168_105000168 aligning addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.fastq with bwa mem

[E::bwa_idx_load] fail to locate the index files INFO 2016-04-24 19:14:42.811336 haplo_1_105000168_105000168 writing addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam.realign.sam to BAM... INFO 2016-04-24 19:14:42.821977 haplo_1_105000168_105000168 deleting SAM: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam.realign.sam INFO 2016-04-24 19:14:42.827789 haplo_1_105000168_105000168 sorting output: samtools sort -@ 1 -m 10000000000 -T addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam.realign.sorted.bam -o addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam.realign.sorted.bam addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam INFO 2016-04-24 19:14:42.837643 haplo_1_105000168_105000168 remove original bam:addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam INFO 2016-04-24 19:14:42.842098 haplo_1_105000168_105000168 rename sorted bam: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam.realign.sorted.bam to original name: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam INFO 2016-04-24 19:14:42.847906 haplo_1_105000168_105000168 indexing: samtools index addsnv.tmp/haplo_1_105000168_105000168.tmpbam.90f11da0-bf53-4e1e-be28-4b3326aad92e.bam


ERROR 2016-04-24 19:14:42.859348 encountered error in mutation spikein: ['1_105000168_105000168_0.565217391304_None'] Traceback (most recent call last): File "/nfs/apps/experimental/Miniconda2/lib/python2.7/site-packages/bamsurgeon-1.0-py2.7.egg/EGG-INFO/scripts/addsnv.py", line 275, in makemut File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 58, in remap_bam remap_bwamem_bam(bamfn, threads, fastaref, picardjar, mutid=mutid, paired=paired) File "build/bdist.linux-x86_64/egg/bamsurgeon/aligners.py", line 195, in remap_bwamem_bam raise ValueError("ERROR\t" + now() + "\t" + mutid + "\tbam readcount < fastq readcount, alignment sanity check failed!\n") ValueError: ERROR 2016-04-24 19:14:42.859292 haplo_1_105000168_105000168 bam readcount < fastq readcount, alignment sanity check failed!

adamewing commented 8 years ago

This could be a missing index on the reference genome, has your reference genome (the fasta you're feeding to -r/--reference) been indexed with bwa and samtools, i.e.:

samtools faidx reference_hg19.fasta
bwa index reference_hg19.fasta
aistrand commented 8 years ago

I think the program is finally working now! One question: what exactly does dropped for outcover/incover < 0.9 mean?

aistrand commented 8 years ago

I'm a bit confused as to where the final, modified bam files get stored and what their names are. I have bam files created with names like "addsnv.94ecdde3-a528-424a-862b-94abfaaf1d78.muts.bam", as well as the addsnv.tmp folder with temporary bam and fastq files (I assume?). I specified the name of the final bam files using the -o option but they don't seem to be getting created, only bam files with names like what I specified above get outputted. Should I be concerned?

adamewing commented 8 years ago

The outcover/incover is a check that the number of reads before and after mutation and realignment is relatively constant, as controlled by the parameter -d/--coverdiff.

You should be getting output in the file specified by you as -o/--outbam. If it's not being created there is probably something going wrong at the merge step, as it looks like all the temporary files are left behind. Could you send me the output from your run? Pipe into a file like so:

addsnv.py (... parameters ...) > addsnv.log.txt 2>&1

and either email to adam.ewing@gmail.com or I think you can attach files to these messages.

aistrand commented 8 years ago

OK, will do, working on it now. When exactly does the merge step happen (what output message is produced)? I get the following output for a given site:

INFO 2016-04-25 13:55:12.147765 haplo_1_105000168_105000168 creating tmp bam: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam WARN 2016-04-25 13:55:12.973872 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:1201:4661:35955 WARN 2016-04-25 13:55:12.974686 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:287:C2JDBACXX:3:2309:4805:61010 WARN 2016-04-25 13:55:12.975819 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2308:12041:43202 WARN 2016-04-25 13:55:12.980904 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:1110:2582:82263 WARN 2016-04-25 13:55:12.981284 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:1307:7659:77106 WARN 2016-04-25 13:55:12.982731 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2310:8936:54737 WARN 2016-04-25 13:55:12.983114 1_105000168_105000168_0.565217391304_None warning: no mate for HWI-ST1329:317:C2UWWACXX:3:2302:17262:55131 INFO 2016-04-25 13:55:13.618823 haplo_1_105000168_105000168 len(readlist): 20 INFO 2016-04-25 13:55:13.619019 haplo_1_105000168_105000168 selected VAF: 0.565217391304

INFO 2016-04-25 13:55:13.619128 haplo_1_105000168_105000168 picked: 11 INFO 2016-04-25 13:55:13.619778 haplo_1_105000168_105000168 wrote: 21 mutated: 11 INFO 2016-04-25 13:55:13.624469 haplo_1_105000168_105000168 converting addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam to fastq

INFO 2016-04-25 13:55:13.624717 converting BAM addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam to FASTQ [Mon Apr 25 13:55:15 EDT 2016] picard.sam.SamToFastq INPUT=addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam FASTQ=addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT OUTPUT_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json [Mon Apr 25 13:55:15 EDT 2016] Executing as ams2432@ha1c3n6.hpc on Linux 2.6.32-431.17.1.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_51-b13; Picard version: 1.141(8ece590411350163e7689e9e77aab8efcb622170_1447695087) IntelDeflater [Mon Apr 25 13:55:15 EDT 2016] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes. Runtime.totalMemory()=503840768 INFO 2016-04-25 13:55:15.671956 haplo_1_105000168_105000168 aligning addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.fastq with bwa mem

[M::main_mem] read 26 sequences (2600 bp)... [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 11, 0, 0) [M::mem_pestat] skip orientation FF as there are not enough pairs [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat](25, 50, 75) percentile: (255, 293, 323) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (119, 459) [M::mem_pestat] mean and std.dev: (280.27, 69.19) [M::mem_pestat] low and high boundaries for proper pairs: (4, 527) [M::mem_pestat] skip orientation RF as there are not enough pairs [M::mem_pestat] skip orientation RR as there are not enough pairs [M::mem_process_seqs] Processed 26 reads in 0.005 CPU sec, 0.012 real sec [main] Version: 0.7.9a-r786 [main] CMD: bwa mem -t 1 -M -p /ifs/data/c2b2/mp_lab/ams2432/files_needed/human_ref/hs37d5.fa addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.fastq [main] Real time: 42.096 sec; CPU: 16.736 sec INFO 2016-04-25 13:55:57.776645 haplo_1_105000168_105000168 writing addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam.realign.sam to BAM... INFO 2016-04-25 13:55:57.799138 haplo_1_105000168_105000168 deleting SAM: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam.realign.sam INFO 2016-04-25 13:55:57.806648 haplo_1_105000168_105000168 sorting output: samtools sort -@ 1 -m 10000000000 -T addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam.realign.sorted.bam -o addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam.realign.sorted.bam addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam INFO 2016-04-25 13:55:57.815930 haplo_1_105000168_105000168 remove original bam:addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam INFO 2016-04-25 13:55:57.826964 haplo_1_105000168_105000168 rename sorted bam: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam.realign.sorted.bam to original name: addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam INFO 2016-04-25 13:55:57.833135 haplo_1_105000168_105000168 indexing: samtools index addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.bam INFO 2016-04-25 13:55:57.855305 haplo_1_105000168_105000168 removing addsnv.tmp/haplo_1_105000168_105000168.tmpbam.7801781f-7c16-4a98-bfee-e647b16ad270.fastq INFO 2016-04-25 13:55:57.865627 haplo_1_105000168_105000168 avgincover: 20.6666666667 avgoutcover: 14.0 WARN 2016-04-25 13:55:57.874008 haplo_1_105000168_105000168 dropped for outcover/incover < 0.9

aistrand commented 8 years ago

I don't think there actually is a problem with the merge step, it's just that the job is not done and that it is still spiking mutations. I tried the program on a smaller set of sites and it merged the files just fine and created the expected output. Thanks for the help, will let you know if I have any other issues as I inspect the bam files and proceed forward...

adamewing commented 8 years ago

OK, glad to hear it, this kind of feedback is always helpful in thinking about how to improve things. You might have a go at running the tests that come with bamsurgeon.

aistrand commented 8 years ago

I had a couple of questions about some warning messages: what exactly does too few reads in region (6) skipping... mean (region (6) being an example), and what does forced 3 reads (3 being an example) mean? What about could not pileup for region: 5:100015928?

Also, what exactly are the tests you are referring to? In the manual it says:

The script test/test snv.sh may be run with no parameters to display a usage statement: usage: ./test_snv.sh <number of SNPs> <number of threads> <reference indexed with bwa index>

Is this what you were referring to? Is it possible to test this out with actual data?