LeeBergstrand / BackBLAST_Reciprocal_BLAST

This repository contains a reciprocal BLAST program for filtering down BLAST results to best bidirectional hits. It also contains a toolkit for finding and visualizing BLAST hits for gene clusters within multiple bacterial genomes.
MIT License
14 stars 8 forks source link

Add sanity check to ensure that the query sequences are contained the query genome proteins (search.py) #60

Open LeeBergstrand opened 2 years ago

LeeBergstrand commented 2 years ago

Problem Description

If a user collects their pathway proteins and their query organisms proteins from different sources, for example, Uniport and Genbank, then BackBlast will give blank results because the two files use different accession systems. The query pathway file and query organism proteins have to use the same accessions.

Problem Solution

  1. Scan the query organism proteins for the pathway proteins by accession and display an error message if they are not found.

OR

  1. Replace the usage of the pathway query file with a file containing a list of accessions from the query subject file. Automatically use the pathway accession list file to extract a pathway query file out of the query organism protein file as a temp file.
jmtsuji commented 2 years ago

@LeeBergstrand I personally like the idea of option #2 -- it seems simpler to me. What do you think?

LeeBergstrand commented 2 years ago

@jmtsuji #2 is probably a good optimization. Let's go with that.