Closed BioRRW closed 4 years ago
Thank you for looking reporting this issue. Could you provide the BAM file and reference genome for me to debug? Thank you.
Also, what version of python are you using?
I updated the setup.py
script. If you uninstall with conda env remove -n mgefinder
, and reinstall according to the instructions, I think it should work.
Thank you for your responses @durrantmm !
I have followed you instructions and have re-installed mgefinder.
I think the problem is by BAM file was generated from aligning the reads back to the generated assembly and not to a reference genome... perhaps.
My overall goal is to use mgefinder with hybrid assembly data (illumina paired-end and MinION/PacBio long-reads)
To do this I went back and used bwa mem to align the paired end reads to a reference genome, not the assembly. Also used bwa mem to align the long-reads to the same reference genome. After this I merged, sorted and indexed the BAM files.
I was then able to pass the merged BAM alignment into mgefinder find.
It seems I may have hit a "recursion depth limit"...:
''' $ mgefinder find illumina_MinION_merged.bam
Current version of snakemake: 3.13.3 Expected version of snakemake: 3.13.3 Current version of einverted: EMBOSS:6.6.0.0 Expected version of einverted: EMBOSS:6.6.0.0 Current version of bowtie2: 2.3.5 Expected version of bowtie2: 2.3.5 Current version of samtools: 1.9 Expected version of samtools: 1.9 Current version of cd-hit: 4.8.1 Expected version of cd-hit: 4.8.1 ###############################
command: find
bamfile: illumina_MinION_merged.bam
min_softclip_length: 8
min_softclip_count: 2
min_alignment_quality: 20
min_alignment_inner_length: 21
min_distance_to_mate: 22
min_softclip_ratio: 0.1
max_indel_ratio: 0.03
large_insertion_cutoff: 30
min_count_consensus: 2
sample_id: /home/reedrich/mgefinder_test/illumina_MinION_merged.bam
check_bwa: True
output_file: mgefinder.find.tsv
####################
Parsing softclipped sites from provided BAM file...
After checking 100000 reads, 17167 softclipped sites found...
After checking 200000 reads, 35304 softclipped sites found...
After checking 300000 reads, 53483 softclipped sites found...
After checking 400000 reads, 71436 softclipped sites found...
After checking 500000 reads, 88474 softclipped sites found...
After checking 600000 reads, 106001 softclipped sites found...
After checking 700000 reads, 123983 softclipped sites found...
After checking 800000 reads, 142755 softclipped sites found...
After checking 900000 reads, 160327 softclipped sites found...
After checking 1000000 reads, 178517 softclipped sites found...
After checking 1100000 reads, 196201 softclipped sites found...
After checking 1200000 reads, 214462 softclipped sites found...
After checking 1300000 reads, 231266 softclipped sites found...
After checking 1400000 reads, 249006 softclipped sites found...
After checking 1500000 reads, 253017 softclipped sites found...
After checking 1600000 reads, 253017 softclipped sites found...
After checking 1700000 reads, 253017 softclipped sites found...
After checking 1800000 reads, 253017 softclipped sites found...
After filtering by minimum softclip length of 8, 211806 sites remain
After filtering by minimum softclipped read count of 2, 25086 sites remain
After filtering by minimum nearest mate distance 22, 1999 sites remain
Getting unclipped read information near softclipped sites...
After filtering by minimum softclip ratio of 0.100000 and a maximum indel ratio of 0.030000, 171 sites remain
After filtering by minimum nearest mate distance 22, 119 sites remain
Generating consensus sequences from softclipped termini...
Traceback (most recent call last):
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 169, in traverse_seqs
words = words + self.traverse_seqs(word, child)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 169, in traverse_seqs
words = words + self.traverse_seqs(word, child)
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 169, in traverse_seqs
words = words + self.traverse_seqs(word, child)
[Previous line repeated 996 more times]
File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 164, in traverse_seqs
if len(node.children) == 0:
RecursionError: maximum recursion depth exceeded in comparison
'''
Do you have any thoughts on this 'RecursionError'? Is their possibly a parameter which I can utilize to overcome this?
Unfortunately I cannot attach the BAM file, it is ~1.3GB :(
Thank you for your insight on this @durrantmm !
Sorry about this! Sounds like an interesting application. How long are the paired-end reads that you aligned to the reference genome? Could you maybe add the BAM file to a dropbox account or something like that so I can take a look?
Hello! I installed MGEfinder using Method 1 as per the installation instructions and everything went fine. I ran (after activating the conda env):
and was able to obtain the *.tsv with correct looking output. Following this I ran:
This command was unsuccessful and the output was:
I am wondering what the source of error is. Is this due to a dependency out of data or perhaps using a different version of python than that which is required?
Thank you for your advice and time. All the best, -BioRRW