bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Question about python version required #7

Closed BioRRW closed 4 years ago

BioRRW commented 4 years ago

Hello! I installed MGEfinder using Method 1 as per the installation instructions and everything went fine. I ran (after activating the conda env):

$ mgefinder find <bam alignment>.bam

and was able to obtain the *.tsv with correct looking output. Following this I ran:

$ mgefinder pair <tsv from mgefinder find>.tsv <bam alignment>.bam <ref genome>.fasta 

This command was unsuccessful and the output was:

#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
#### PARAMETERS ####
command: pair
findfile: mgefinder.find.tsv
bamfile: EC_IC_20_1X_MinION_Hybrid.align.sorted.bam
genome: EC_IC_20_1X_MinION_Hybrid.assembly.fasta
max_direct_repeat_length: 20
min_alignment_quality: 20
min_alignment_inner_length: 21
max_junction_spanning_prop: 0.15
large_insertion_cutoff: 30
output_file: mgefinder.pairs.tsv
####################
Finding all flank pairs within 20 bases of each other ...
Finding all inverted repeats at termini in 8 candidate pairs...
Assigning pairs according to existence of inverted repeats, read count difference, and flank length difference...
Traceback (most recent call last):
  File "/home/reedrich/miniconda3/envs/mgefinder/bin/mgefinder", line 8, in <module>
    sys.exit(cli())
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 123, in pair
    min_alignment_inner_length, max_junction_spanning_prop, large_insertion_cutoff, output_file)
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 40, in _pair
    flank_pairs = flank_pairer.run_pair_flanks()
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 99, in run_pair_flanks
    assigned_pairs = self.assign_pairs(pairs)
  File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/pair.py", line 245, in assign_pairs
    self.get_header_list()].sort_values(['contig', 'pos_5p', 'pos_3p'])
  File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1761, in __getitem__
    return self._getitem_tuple(key)
  File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1288, in _getitem_tuple
    retval = getattr(retval, self.name)._getitem_axis(key, axis=i)
  File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1953, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1594, in _getitem_iterable
    keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
  File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1552, in _get_listlike_indexer
    keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
  File "/home/reedrich/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1654, in _validate_read_indexer
    "Passing list-likes to .loc or [] with any missing labels "
KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

I am wondering what the source of error is. Is this due to a dependency out of data or perhaps using a different version of python than that which is required?

Thank you for your advice and time. All the best, -BioRRW

durrantmm commented 4 years ago

Thank you for looking reporting this issue. Could you provide the BAM file and reference genome for me to debug? Thank you.

durrantmm commented 4 years ago

Also, what version of python are you using?

durrantmm commented 4 years ago

I updated the setup.py script. If you uninstall with conda env remove -n mgefinder, and reinstall according to the instructions, I think it should work.

BioRRW commented 4 years ago

Thank you for your responses @durrantmm !

I have followed you instructions and have re-installed mgefinder.

I think the problem is by BAM file was generated from aligning the reads back to the generated assembly and not to a reference genome... perhaps.

My overall goal is to use mgefinder with hybrid assembly data (illumina paired-end and MinION/PacBio long-reads)

To do this I went back and used bwa mem to align the paired end reads to a reference genome, not the assembly. Also used bwa mem to align the long-reads to the same reference genome. After this I merged, sorted and indexed the BAM files.

I was then able to pass the merged BAM alignment into mgefinder find.

It seems I may have hit a "recursion depth limit"...:

''' $ mgefinder find illumina_MinION_merged.bam

CHECKING DEPENDENCIES

Current version of snakemake: 3.13.3 Expected version of snakemake: 3.13.3 Current version of einverted: EMBOSS:6.6.0.0 Expected version of einverted: EMBOSS:6.6.0.0 Current version of bowtie2: 2.3.5 Expected version of bowtie2: 2.3.5 Current version of samtools: 1.9 Expected version of samtools: 1.9 Current version of cd-hit: 4.8.1 Expected version of cd-hit: 4.8.1 ###############################

PARAMETERS

command: find bamfile: illumina_MinION_merged.bam min_softclip_length: 8 min_softclip_count: 2 min_alignment_quality: 20 min_alignment_inner_length: 21 min_distance_to_mate: 22 min_softclip_ratio: 0.1 max_indel_ratio: 0.03 large_insertion_cutoff: 30 min_count_consensus: 2 sample_id: /home/reedrich/mgefinder_test/illumina_MinION_merged.bam check_bwa: True output_file: mgefinder.find.tsv #################### Parsing softclipped sites from provided BAM file... After checking 100000 reads, 17167 softclipped sites found... After checking 200000 reads, 35304 softclipped sites found... After checking 300000 reads, 53483 softclipped sites found... After checking 400000 reads, 71436 softclipped sites found... After checking 500000 reads, 88474 softclipped sites found... After checking 600000 reads, 106001 softclipped sites found... After checking 700000 reads, 123983 softclipped sites found... After checking 800000 reads, 142755 softclipped sites found... After checking 900000 reads, 160327 softclipped sites found... After checking 1000000 reads, 178517 softclipped sites found... After checking 1100000 reads, 196201 softclipped sites found... After checking 1200000 reads, 214462 softclipped sites found... After checking 1300000 reads, 231266 softclipped sites found... After checking 1400000 reads, 249006 softclipped sites found... After checking 1500000 reads, 253017 softclipped sites found... After checking 1600000 reads, 253017 softclipped sites found... After checking 1700000 reads, 253017 softclipped sites found... After checking 1800000 reads, 253017 softclipped sites found... After filtering by minimum softclip length of 8, 211806 sites remain After filtering by minimum softclipped read count of 2, 25086 sites remain After filtering by minimum nearest mate distance 22, 1999 sites remain Getting unclipped read information near softclipped sites... After filtering by minimum softclip ratio of 0.100000 and a maximum indel ratio of 0.030000, 171 sites remain
After filtering by minimum nearest mate distance 22, 119 sites remain
Generating consensus sequences from softclipped termini... Traceback (most recent call last): File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 169, in traverse_seqs words = words + self.traverse_seqs(word, child) File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 169, in traverse_seqs words = words + self.traverse_seqs(word, child) File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 169, in traverse_seqs words = words + self.traverse_seqs(word, child) [Previous line repeated 996 more times] File "/home/reedrich/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/terminustrie.py", line 164, in traverse_seqs if len(node.children) == 0: RecursionError: maximum recursion depth exceeded in comparison

'''

Do you have any thoughts on this 'RecursionError'? Is their possibly a parameter which I can utilize to overcome this?

Unfortunately I cannot attach the BAM file, it is ~1.3GB :(

Thank you for your insight on this @durrantmm !

durrantmm commented 4 years ago

Sorry about this! Sounds like an interesting application. How long are the paired-end reads that you aligned to the reference genome? Could you maybe add the BAM file to a dropbox account or something like that so I can take a look?