daveuu / baga

Bacterial and Archaeal Genome Analyser
GNU General Public License v3.0
9 stars 2 forks source link

Problems with Repeats filter #8

Open embatty opened 8 years ago

embatty commented 8 years ago

Hi, I am having problems running the Repeats filter on some highly repetitive genomes. Reference NC_009488.1 dies early on:

baga_cli.py Repeats -g NC_009488.1 --find
Traceback (most recent call last):
  File "baga/baga_cli.py", line 1832, in <module>
    finder.findRepeats(minimum_percent_identity = args.minimum_percent_identity * 0.01, minimum_repeat_length = args.minimum_repeat_length)
  File "baga/Repeats.py", line 1209, in findRepeats
    self.getHomologousContiguousBlocks()
  File "Repeats.py", line 260, in getHomologousContiguousBlocks
    thisORFhit_nearestWithHits = self.getAdjacentORFsWithHit(thisORFhit, direction = 1, getall = True)
  File "baga/Repeats.py", line 156, in getAdjacentORFsWithHit
    next_hit_ORF = self.ORFs_with_hits_ordered[thisORFn_hits + n * direction]

NC_010793.1 runs for longer then fails with a different error:

baga_cli.py Repeats -g  NC_010793.1 --find
Traceback (most recent call last):
  File "baga/baga_cli.py", line 1832, in <module>
    finder.findRepeats(minimum_percent_identity = args.minimum_percent_identity * 0.01, minimum_repeat_length = args.minimum_repeat_length)
  File "baga/Repeats.py", line 1217, in findRepeats
    self.align_blocks(min_pID = 0.85, max_extensions = 15)
  File "baga/Repeats.py", line 547, in align_blocks
    postORFB_start, postORFB_end = loci_ranges_use_B[(i+1)*2:(i+1)*2+2]
ValueError: need more than 0 values to unpack

Any help would be appreciated - I can run the Repeats module on other genomes, so I think something about the repetitive nature of my references is causing a problem. Thanks.

daveuu commented 8 years ago

Spectacular genomes! I'm working on a fix. These are ideal test cases. Thanks for your interest.