Running parsexmlblast - Githubissues

gtsiamis commented 11 years ago

Hi,

I am trying to get the taxonomy after a blast search and I came across your biopython scripts which they did look very promising.

Ihave tried them with a Mac 10.8.3 with python and biopython installed and with Biolinux 7 (it has the Biopython installed) and every time that I am trying to run the parsexmlblast.py script I get the following error

seqnr hitginr hitname evalue bitscore similarity score Traceback (most recent call last): File "././parsexmlblast.py", line 76, in main() File "././parsexmlblast.py", line 69, in main blastresults2tabformat(blastresultsfile) File "././parsexmlblast.py", line 52, in blastresults2tabformat topalGI, topalName = parseGIandNameFromDefLine(topalignment.title) File "././parsexmlblast.py", line 26, in parseGIandNameFromDefLine return [lineElements[1], lineElements[4]] IndexError: list index out of range

Any suggestions will be great.

All the best

George Tsiamis

bartaelterman commented 11 years ago

Hi George,

Thanks for your interest. I suspect there is a problem in the header line that the script gets from Blast. The script expects a line including several fields, separated by a pipe symbol ("|"). It splits the line on this symbol and tries to return field 2 and 5, which should be the gi-number and the alignment name. However, at this point, you get an error.

I don't know how familiar you are with python. You can try inserting additional print statements to drill down where the error comes from. If you're not too comfortable with that, I want to help you out, but then I need more info. Ideally, you would make a testfile including results from Blast in xml that causes this error and paste the xml here so that I can reproduce the error.

Cheers,

Bart

gtsiamis commented 11 years ago

Hi Bart,

I appreciate your e-mail and your willingness to help me. The assignment of taxonomy to blast_out files has been troubling me so some time now and I have not found a nice and neat solution as yet.

Unfortunately, I am not so familiar with python, so I am attaching a dropbox link for download the xml file that is causing the trouble (31MB)

https://www.dropbox.com/s/6y4hovgzq83wehz/HGE_out.xml

Thanks in advance

George

On Sep 9, 2013, at 10:26 AM, bartaelterman notifications@github.com wrote:

Hi George,

Thanks for your interest. I suspect there is a problem in the header line that the script gets from Blast. The script expects a line including several fields, separated by a pipe symbol ("|"). It splits the line on this symbol and tries to return field 2 and 5, which should be the gi-number and the alignment name. However, at this point, you get an error.

I don't know how familiar you are with python. You can try inserting additional print statements to drill down where the error comes from. If you're not too comfortable with that, I want to help you out, but then I need more info. Ideally, you would make a testfile including results from Blast in xml that causes this error and paste the xml here so that I can reproduce the error.

Cheers,

Bart

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Hi George,

There are indeed no GI identifiers in your BLAST output. That is what my script uses to fetch the taxonomy. I have no idea how that happened. I noticed you used nr while I used nt, but I think it's unlikely that that would explain the absence of GI identifiers. Anyway, my whole solution is based on these GI numbers, because NCBI released a gi_taxid_nucl.dmp file that maps these GI identifiers on taxon IDs, which can then be used to fetch the taxonomy of the sequence. So I cannot fetch any taxonomic information without that GI number.

I suggest you check out why you don't get GI identifiers in your output.

In my output, I have lines like this:

  <BlastOutput_query-def>gi|175550325|gb|ABNJ01122116.1| Mosquito metagenome 6512906, whole genome shotgun sequence</BlastOutput_query-def>

While you have:

<BlastOutput_query-def>Concatenated_sequences_1   1 to 5000 Concatenation of 79 sequences</BlastOutput_query-def>

See? No GI. If you did nothing special, simply took the most recent version of nt/nr and the most recent version of BLAST (and Biopython), it could be that an update in one of either broke my code. In that case, I should check whether I can fix it. If you find a way to enforce BLAST returning you GI numbers in its output, you're back on track.

Greetings

gtsiamis commented 11 years ago

Hi Bart,

Thanks for the information. The blast search that I was doing was not recording the GI number. I have included the appropriate parameter and the blast output is now including the GI number. This time the error that I am getting when I am running parsexmlblast.py is as follows:

File "./parsexmlblast.py", line 9 for description in blastRecord.descriptions: ^ IndentationError: expected an indented block

I am also attaching a blastout xml file from a test blast that I have performed.

Looking forward for your reply

All the best

George

On Sep 9, 2013, at 7:18 PM, bartaelterman notifications@github.com wrote:

Hi George,

There are indeed no GI identifiers in your BLAST output. That is what my script uses to fetch the taxonomy. I have no idea how that happened. I noticed you used nr while I used nt, but I think it's unlikely that that would explain the absence of GI identifiers. Anyway, my whole solution is based on these GI numbers, because NCBI released a gi_taxid_nucl.dmp file that maps these GI identifiers on taxon IDs, which can then be used to fetch the taxonomy of the sequence. So I cannot fetch any taxonomic information without that GI number.

I suggest you check out why you don't get GI identifiers in your output.

In my output, I have lines like this:
gi|175550325|gb|ABNJ01122116.1| Mosquito metagenome 6512906, whole genome shotgun sequence
While you have:
Concatenated_sequences_1 1 to 5000 Concatenation of 79 sequences
See? No GI. If you did nothing special, simply took the most recent version of nt/nr and the most recent version of BLAST (and Biopython), it could be that an update in one of either broke my code. In that case, I should check whether I can fix it. If you find a way to enforce BLAST returning you GI numbers in its output, you're back on track.

Greetings

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Hmmm... That's weird. I can't reproduce that problem.

That is a classic python error, indicating a syntax error in the script. Python expects indented blocks of code. Unlike other programming languages, you have to indent your code, and you have to do it consequently. With this error, python is saying that is not the case.

However, if I run the script, I don't encounter that problem. Here, everything works fine. Did you open the parsexmlblast.py file in some editor? It could be that it transforms tabs into spaces or the other way round which might confuse python.

Also, which python version are you using? You can check this with the command

python --version

I am using 2.7.2

gtsiamis commented 11 years ago

Hi Bart,

Thanks for your input.

I now download the script again and I used the normal text editor that comes with Biolinux 7. This distro comes with Python 2.7.3.

I run your script again and now the error that I get is as follows:

seqnr hitginr hitname evalue bitscore similarity score Traceback (most recent call last): File "./parsexmlblast.py", line 76, in main() File "./parsexmlblast.py", line 69, in main blastresults2tabformat(blastresultsfile) File "./parsexmlblast.py", line 65, in blastresults2tabformat print "\t".join([parseMetagenomeSeqNumber(blastrecord.query), topalGI, topalName, expect, bits, similarity, score]) File "./parsexmlblast.py", line 22, in parseMetagenomeSeqNumber return lineElements[3] IndexError: list index out of range

Your input will be much appreciated.

All the best

George

On Sep 14, 2013, at 11:20 PM, bartaelterman notifications@github.com wrote:

Hmmm... That's weird. I can't reproduce that problem.

That is a classic python error, indicating a syntax error in the script. Python expects indented blocks of code. Unlike other programming languages, you have to indent your code, and you have to do it consequently. With this error, python is saying that is not the case.

However, if I run the script, I don't encounter that problem. Here, everything works fine. Did you open the parsexmlblast.py file in some editor? It could be that it transforms tabs into spaces or the other way round which might confuse python.

Also, which python version are you using? You can check this with the command

python --version I am using 2.7.2

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Looks like we're getting close!

What my script does there, is the following: it fetches the definition line from the query sequence (so the one you started the BLAST search with, not the one BLAST returns). In my case, that line contains a couple of fields, separated by a pipe symbol. The fourth element on this line, contains some kind of an id, which I used in the outputfile so I don't have to write thte entire definition line to the output. However, it is actually a bad practice from me to use this line in the script, because that puts some constraints on the input file you're allowed to use when performing your BLAST search, and that is not very clean.

I can imagine others might suffer from this too, so I'm going to fix this in the code. Currently, the output of the parsexmlblast.py is a tab separated values file, with the first values being:

<input sequence identifier> <hit sequence identifier> <hit sequence name>

So it's the first field causing the current trouble. But would it be alright if I would replace that by the entire definition line of the input sequence?

gtsiamis commented 11 years ago

ΟΚ. That is clear to me now.

To replace it with the entire definition line will be fine for me.

Thanks

George

On Sep 16, 2013, at 9:48 AM, bartaelterman notifications@github.com wrote:

Looks like we're getting close!

What my script does there, is the following: it fetches the definition line from the query sequence (so the one you started the BLAST search with, not the one BLAST returns). In my case, that line contains a couple of fields, separated by a pipe symbol. The fourth element on this line, contains some kind of an id, which I used in the outputfile so I don't have to write thte entire definition line to the output. However, it is actually a bad practice from me to use this line in the script, because that puts some constraints on the input file you're allowed to use when performing your BLAST search, and that is not very clean.

I can imagine others might suffer from this too, so I'm going to fix this in the code. Currently, the output of the parsexmlblast.py is a tab separated values file, with the first values being:

So it's the first field causing the current trouble. But would it be alright if I would replace that by the entire definition line of the input sequence?

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

I updated the script. Can you try again and let me know?

gtsiamis commented 11 years ago

The parsexmlblast script run smoothly without a problem. Thanks for all the help.

Then I run the addTaxonomyToBlastOutput.py. Before I run the script I performed the following:

I downloaded the gi_taxid_nucl.dmp file from ncbi.

I modified the blasthittaxonomy.py script in the following way:

- Included my e-mail 

-I replaced the "knowntaxonomy.txt" with the gi_taxid_nucl.dmp path

Now the error that I am getting when I run the addTaxonomyToBlastOutput.py is

python(49649) malloc: * mmap(size=140454840422400) failed (error code=12) * error: can't allocate region * set a breakpoint in malloc_error_break to debug Traceback (most recent call last): File "./add_Taxonomy.py", line 39, in main() File "./add_Taxonomy.py", line 14, in main taxfetcher = blasthittaxonomy.TaxonomyFetcher() File "/Volumes/Macintosh_HD2/Dropbox/GT_Files/Projects/Anastasis_analysis/blasthittaxonomy.py", line 31, in init** self.getKnownTaxa() File "/Volumes/Macintosh_HD2/Dropbox/GT_Files/Projects/Anastasis_analysis/blasthittaxonomy.py", line 25, in getKnownTaxa self.knownTaxa = pickle.loads(infile.read()) MemoryError

Does that mean that the gi_taxid_nucl.dmp file is too big (my computer has 16GB of RAM).

Looking forward for your input

George

On Sep 16, 2013, at 12:35 PM, bartaelterman notifications@github.com wrote:

I updated the script. Can you try again and let me know?

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

No, you shouldn't replace knowntaxonomy.txt. That is a temporary file that the script creates. Here is what happens: the script fetches the gi number. Based on this number and the gi_taxid_nucl.dmp, it fetches the taxonid, with this taxonid, it queries NCBI's servers to fetch taxonomic information. The knowntaxonomy.txt is similar to a cache for NCBI's information. So with the taxonid, the script first checks whether it has taxon information in this file, before querying NCBI, to reduce the number of slow calls you have to perform. The first time you run the script, that is obviously not the case, but will at least create the file. So you don't have to touch that file.

If you open the blasthittaxonomy.py script, you should look for the file gi_taxid_nucl.dmp. You'll see that it expects this file to be in the current working directory, which is probably not true. You can solve this by:

put the file (or better: a link to the file) in the current directory.
edit the script, and hard code the full path to that file.

Cheers.

gtsiamis commented 11 years ago

OK. I went to the original blasthittaxonomy.py file. This time I included my e-mail only.

I placed the gi_taxid_nucl.dmp file in the same directory with the other scripts.

I run the addTaxonomyToBlastOutput.py script and now I am getting the following

Traceback (most recent call last): File "./addTaxonomyToBlastOutput.py", line 39, in main() File "./addTaxonomyToBlastOutput.py", line 14, in main taxfetcher = blasthittaxonomy.TaxonomyFetcher() File "/Users/gtsiamis/Blast/Taxonomy/blasthittaxonomy.py", line 31, in init self.getKnownTaxa() File "/Users/gtsiamis/Blast/Taxonomy/blasthittaxonomy.py", line 24, in getKnownTaxa infile = open(self.knownTaxaFile) IOError: [Errno 2] No such file or directory: 'knowntaxonomy.txt'

I am sorry for all the trouble.

Looking for your reply.

George

On Sep 16, 2013, at 1:52 PM, bartaelterman notifications@github.com wrote:

No, you shouldn't replace knowntaxonomy.txt. That is a temporary file that the script creates. Here is what happens: the script fetches the gi number. Based on this number and the gi_taxid_nucl.dmp, it fetches the taxonid, with this taxonid, it queries NCBI's servers to fetch taxonomic information. The knowntaxonomy.txt is similar to a cache for NCBI's information. So with the taxonid, the script first checks whether it has taxon information in this file, before querying NCBI, to reduce the number of slow calls you have to perform. The first time you run the script, that is obviously not the case, but will at least create the file. So you don't have to touch that file.

If you open the blasthittaxonomy.py script, you should look for the file gi_taxid_nucl.dmp. You'll see that it expects this file to be in the current working directory, which is probably not true. You can solve this by:

put the file (or better: a link to the file) in the current directory. edit the script, and hard code the full path to that file. Cheers.

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Aha! Now we're down to the problem: that's really an issue in my code (so don't worry about bothering me about it). I included a couple of lines in the script to check for the existence of this knowntaxonomy.txt. If not, you start with an empty cache, if it exists, it is used to reduce the number of service calls to NCBI.

You can fetch a new version of the blasthittaxonomy.py file. Don't forget to replace the email address again.

How does that go?

gtsiamis commented 11 years ago

Hey.

Thanks ever so much. It worked like a charm.

A final question. How can I cite your scripts?

Again thanks for your input and help.

George

On Sep 17, 2013, at 11:37 AM, bartaelterman notifications@github.com wrote:

Aha! Now we're down to the problem: that's really an issue in my code (so don't worry about bothering me about it). I included a couple of lines in the script to check for the existence of this knowntaxonomy.txt. If not, you start with an empty cache, if it exists, it is used to reduce the number of service calls to NCBI.

You can fetch a new version of the blasthittaxonomy.py file. Don't forget to replace the email address again.

How does that go?

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

You're very welcome.

It's very kind of you to cite my scripts. For software, that is obviously a bit different and I am not sure of general practices. Certainly, because software can change, you probably want to refer to the current version.

So I created a "release" of this repository: v0.3. I suggest you use the url that links to that version in your citation. When a user follows that link, he can download the repository or browse the code there as it is today. In terms of reproducibility, that should suffice. Github does not provide doi's for repositories so I think this is the most stable identifier I can offer. You can use my name as author and the repository's name or description as title. I'll add information to the README of the repository.

Also, I added a LICENSE file that officially grants any user the right to use this software.

Good luck!

gtsiamis commented 11 years ago

Hi Bart,

I am sorry to bother you again but I have another hickup with the scripts.

I just created a custom database using fasta files from NCBI. I performed a blast search and got the xml output. When I run the parsexmlblast script the new file that I get contains the phrase BL_ORD_ID instead of the gi number (see below).

Concatenated_sequences_60 265501 to 270500 Concatenation of 25215 sequences BL_ORD_ID ref 0.61135 46.087 1.0 23.0 Concatenated_sequences_61 270001 to 275000 Concatenation of 25215 sequences BL_ORD_ID ref 0.61135 46.087 0.942857142857 23.0 Concatenated_sequences_62 274501 to 279500 Concatenation of 25215 sequences BL_ORD_ID ref 0.61135 46.087 1.0 23.0 Concatenated_sequences_63 279001 to 284000 Concatenation of 25215 sequences BL_ORD_ID ref 0.154718 48.0694 0.964285714286 24.0 Concatenated_sequences_64 283501 to 288500 Concatenation of 25215 sequences BL_ORD_ID ref 0.61135 46.087 1.0 23.0 Concatenated_sequences_65 288001 to 293000 Concatenation of 25215 sequences BL_ORD_ID ref 0.61135 46.087 1.0 23.0 Concatenated_sequences_66 292501 to 297500 Concatenation of 25215 sequences BL_ORD_ID ref 0.154718 48.0694 0.964285714286 24.0

Why do I get BL_ORD_ID? Is it the way that I made the blast database?

Any suggestions?

Thanks in advance

George

On Sep 17, 2013, at 12:52 PM, bartaelterman notifications@github.com wrote:

You're very welcome.

It's very kind of you to cite my scripts. For software, that is obviously a bit different and I am not sure of general practices. Certainly, because software can change, you probably want to refer to the current version.

So I created a "release" of this repository: v0.3. I suggest you use the url that links to that version in your citation. When a user follows that link, he can download the repository or browse the code there as it is today. In terms of reproducibility, that should suffice. Github does not provide doi's for repositories so I think this is the most stable identifier I can offer. You can use my name as author and the repository's name or description as title. I'll add information to the README of the repository.

Also, I added a LICENSE file that officially grants any user the right to use this software.

Good luck!

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Hi George,

I am not sure whether that will work. Are you sure the fasta sequences you downloaded from NCBI contain gi numbers? Remember, my script expects every sequence to have a header line with fields separated by "|" symbols of which field 2 is the gi number and field 5 the alignment name (as I mentioned in my first reply).

So is this the case for the fasta sequences you created the Blast database with?

If so, it is possible that you lost this information during the creation of the Blast database. But let's not move too fast. First check whether you have gi numbers in your sequences anyway.

Cheers

gtsiamis commented 11 years ago

Hi Brat,

I am attaching to this e-mail all the headers of the fasta file that I have used to make the custom blast database.

Also when I used the -m 8 option (tabular output) I can see the gi numbers..

Thanks for your help.

Kind regards

George

On Sep 25, 2013, at 5:32 PM, bartaelterman notifications@github.com wrote:

Hi George,

I am not sure whether that will work. Are you sure the fasta sequences you downloaded from NCBI contain gi numbers? Remember, my script expects every sequence to have a header line with fields separated by "|" symbols of which field 2 is the gi number and field 5 the alignment name (as I mentioned in my first reply).

So is this the case for the fasta sequences you created the Blast database with?

If so, it is possible that you lost this information during the creation of the Blast database. But let's not move too fast. First check whether you have gi numbers in your sequences anyway.

Cheers

— Reply to this email directly or view it on GitHub.

gi|15604717|ref|NC_000117.1| Chlamydia trachomatis D/UW-3/CX chromosome, complete genome gi|15642775|ref|NC_000853.1| Thermotoga maritima MSB8 chromosome, complete genome gi|118430835|ref|NC_000854.2| Aeropyrum pernix K1, complete genome gi|14518450|ref|NC_000868.1| Pyrococcus abyssi GE5 chromosome, complete genome gi|16271976|ref|NC_000907.1| Haemophilus influenzae Rd KW20 chromosome, complete genome gi|108885074|ref|NC_000908.2| Mycoplasma genitalium G37, complete genome gi|15668172|ref|NC_000909.1| Methanocaldococcus jannaschii DSM 2661 chromosome, complete genome gi|16329170|ref|NC_000911.1| Synechocystis sp. PCC 6803 chromosome, complete genome gi|13507739|ref|NC_000912.1| Mycoplasma pneumoniae M129 chromosome, complete genome gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome gi|255767010|ref|NC_000914.2| Sinorhizobium fredii NGR234 plasmid pNGR234a, complete sequence gi|15644634|ref|NC_000915.1| Helicobacter pylori 26695 chromosome, complete genome gi|15678031|ref|NC_000916.1| Methanothermobacter thermautotrophicus str. Delta H chromosome, complete genome gi|11497621|ref|NC_000917.1| Archaeoglobus fulgidus DSM 4304, complete genome gi|15282445|ref|NC_000918.1| Aquifex aeolicus VF5, complete genome gi|15638995|ref|NC_000919.1| Treponema pallidum subsp. pallidum str. Nichols chromosome, complete genome gi|15611071|ref|NC_000921.1| Helicobacter pylori J99 chromosome, complete genome gi|15617929|ref|NC_000922.1| Chlamydophila pneumoniae CWL029 chromosome, complete genome gi|11497060|ref|NC_000948.1| Borrelia burgdorferi B31 plasmid cp32-1, complete sequence gi|11497103|ref|NC_000949.1| Borrelia burgdorferi B31 plasmid cp32-3, complete sequence gi|11497149|ref|NC_000950.1| Borrelia burgdorferi B31 plasmid cp32-4, complete sequence gi|11497193|ref|NC_000951.1| Borrelia burgdorferi B31 plasmid cp32-6, complete sequence gi|11497236|ref|NC_000952.1| Borrelia burgdorferi B31 plasmid cp32-7, complete sequence gi|11497281|ref|NC_000953.1| Borrelia burgdorferi B31 plasmid cp32-8, complete sequence gi|11497325|ref|NC_000954.1| Borrelia burgdorferi B31 plasmid cp32-9, complete sequence gi|365823329|ref|NC_000955.2| Borrelia burgdorferi B31 plasmid lp21, complete sequence gi|11497372|ref|NC_000956.1| Borrelia burgdorferi B31 plasmid lp56, complete sequence gi|11497445|ref|NC_000957.1| Borrelia burgdorferi B31 plasmid lp5, complete sequence gi|10957398|ref|NC_000958.1| Deinococcus radiodurans R1 plasmid MP1, complete sequence gi|10957530|ref|NC_000959.1| Deinococcus radiodurans R1 plasmid CP1, complete sequence gi|14589963|ref|NC_000961.1| Pyrococcus horikoshii OT3 chromosome, complete genome gi|448814763|ref|NC_000962.3| Mycobacterium tuberculosis H37Rv complete genome gi|15603881|ref|NC_000963.1| Rickettsia prowazekii str. Madrid E chromosome, complete genome gi|255767013|ref|NC_000964.3| Bacillus subtilis subsp. subtilis str. 168 chromosome, complete genome gi|15805042|ref|NC_001263.1| Deinococcus radiodurans R1 chromosome 1, complete sequence gi|15807672|ref|NC_001264.1| Deinococcus radiodurans R1 chromosome 2, complete sequence gi|15594346|ref|NC_001318.1| Borrelia burgdorferi B31 chromosome, complete genome gi|10954629|ref|NC_001399.1| Ralstonia solanacearum M4S plasmid pJTPS1, complete sequence gi|10954488|ref|NC_001732.1| Methanocaldococcus jannaschii DSM 2661 plasmid large ECE, complete sequence gi|10954532|ref|NC_001733.1| Methanocaldococcus jannaschii DSM 2661 plasmid small ECE, complete sequence gi|10954552|ref|NC_001773.1| Pyrococcus abyssi GE5 plasmid pGT5, complete sequence gi|365823332|ref|NC_001849.2| Borrelia burgdorferi B31 plasmid lp17, complete sequence gi|11496607|ref|NC_001850.1| Borrelia burgdorferi B31 plasmid lp25, complete sequence gi|365823337|ref|NC_001851.2| Borrelia burgdorferi B31 plasmid lp28-1, complete sequence gi|11496664|ref|NC_001852.1| Borrelia burgdorferi B31 plasmid lp28-2, complete sequence gi|11496697|ref|NC_001853.1| Borrelia burgdorferi B31 plasmid lp28-3, complete sequence gi|11496735|ref|NC_001854.1| Borrelia burgdorferi B31 plasmid lp28-4, complete sequence gi|11496779|ref|NC_001855.1| Borrelia burgdorferi B31 plasmid lp36, complete sequence gi|11496831|ref|NC_001856.1| Borrelia burgdorferi B31 plasmid lp38, complete sequence gi|365823346|ref|NC_001857.2| Borrelia burgdorferi B31 plasmid lp54, complete sequence gi|10803547|ref|NC_001869.1| Halobacterium sp. NRC-1 plasmid pNRC100, complete sequence gi|10957041|ref|NC_001880.1| Aquifex aeolicus VF5 plasmid ece1, complete sequence gi|11497007|ref|NC_001903.1| Borrelia burgdorferi B31 plasmid cp26, complete sequence gi|11497048|ref|NC_001904.1| Borrelia burgdorferi B31 plasmid cp9, complete sequence gi|15004705|ref|NC_001988.2| Clostridium acetobutylicum ATCC 824 plasmid pSOL1, complete sequence gi|10955262|ref|NC_002127.1| Escherichia coli O157:H7 str. Sakai plasmid pOSAK1, complete sequence gi|10955266|ref|NC_002128.1| Escherichia coli O157:H7 str. Sakai plasmid pO157, complete sequence gi|13357558|ref|NC_002162.1| Ureaplasma parvum serovar 3 str. ATCC 700970 chromosome, complete genome gi|15791399|ref|NC_002163.1| Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 chromosome, complete genome gi|58021288|ref|NC_002179.2| Chlamydophila pneumoniae AR39, complete genome gi|9791176|ref|NC_002180.1| Chlamydia phage CPAR39, complete genome gi|10957566|ref|NC_002182.1| Chlamydia muridarum Nigg plasmid pMoPn, complete sequence gi|10957099|ref|NC_002252.1| Buchnera aphidicola str. APS (Acyrthosiphon pisum) plasmid pTrp, complete sequence gi|10957103|ref|NC_002253.1| Buchnera aphidicola str. APS (Acyrthosiphon pisum) plasmid pLeu, complete sequence gi|57014152|ref|NC_002488.3| Xylella fastidiosa 9a5c chromosome, complete genome gi|56968324|ref|NC_002489.3| Xylella fastidiosa 9a5c plasmid pXF1.3, complete sequence gi|10956711|ref|NC_002490.1| Xylella fastidiosa 9a5c plasmid pXF51, complete sequence gi|15835535|ref|NC_002491.1| Chlamydophila pneumoniae J138 chromosome, complete genome gi|15640032|ref|NC_002505.1| Vibrio cholerae O1 biovar El Tor str. N16961 chromosome I, complete sequence gi|15600771|ref|NC_002506.1| Vibrio cholerae O1 biovar El Tor str. N16961 chromosome II, complete sequence gi|110645304|ref|NC_002516.2| Pseudomonas aeruginosa PAO1 chromosome, complete genome gi|15616630|ref|NC_002528.1| Buchnera aphidicola str. APS (Acyrthosiphon pisum) chromosome, complete genome gi|57596592|ref|NC_002570.2| Bacillus halodurans C-125 chromosome, complete genome gi|16081186|ref|NC_002578.1| Thermoplasma acidophilum DSM 1728 chromosome, complete genome gi|15789340|ref|NC_002607.1| Halobacterium sp. NRC-1 chromosome, complete genome gi|16119979|ref|NC_002608.1| Halobacterium sp. NRC-1 plasmid pNRC200, complete sequence gi|29337300|ref|NC_002620.2| Chlamydia muridarum Nigg, complete genome gi|16445223|ref|NC_002655.2| Escherichia coli O157:H7 str. EDL933 chromosome, complete genome gi|15671982|ref|NC_002662.1| Lactococcus lactis subsp. lactis Il1403 chromosome, complete genome gi|15601865|ref|NC_002663.1| Pasteurella multocida subsp. multocida str. Pm70 chromosome, complete genome gi|15826865|ref|NC_002677.1| Mycobacterium leprae TN chromosome, complete genome gi|57165207|ref|NC_002678.2| Mesorhizobium loti MAFF303099 chromosome, complete genome gi|13488050|ref|NC_002679.1| Mesorhizobium loti MAFF303099 plasmid pMLa, complete sequence gi|13195732|ref|NC_002682.1| Mesorhizobium loti MAFF303099 plasmid pMLb, complete sequence gi|13540831|ref|NC_002689.2| Thermoplasma volcanium GSS1 chromosome, complete genome gi|15829254|ref|NC_002695.1| Escherichia coli O157:H7 str. Sakai chromosome, complete genome gi|16124256|ref|NC_002696.2| Caulobacter crescentus CB15 chromosome, complete genome gi|15674250|ref|NC_002737.1| Streptococcus pyogenes SF370 chromosome, complete genome gi|29165615|ref|NC_002745.2| Staphylococcus aureus subsp. aureus N315 chromosome, complete genome gi|15896971|ref|NC_002754.1| Sulfolobus solfataricus P2 chromosome, complete genome gi|50953765|ref|NC_002755.2| Mycobacterium tuberculosis CDC1551 chromosome, complete genome gi|57634611|ref|NC_002758.2| Staphylococcus aureus subsp. aureus Mu50 chromosome, complete genome gi|15828471|ref|NC_002771.1| Mycoplasma pulmonis UAB CTIP, complete genome gi|14141823|ref|NC_002774.1| Staphylococcus aureus subsp. aureus Mu50 plasmid VRSAp, complete sequence gi|33598993|ref|NC_002927.3| Bordetella bronchiseptica RB50 chromosome, complete genome gi|33594723|ref|NC_002928.3| Bordetella parapertussis 12822 chromosome, complete genome gi|33591275|ref|NC_002929.2| Bordetella pertussis Tohama I chromosome, complete genome gi|21672841|ref|NC_002932.3| Chlorobium tepidum TLS chromosome, complete genome gi|38232642|ref|NC_002935.2| Corynebacterium diphtheriae NCTC 13129 chromosome, complete genome gi|57233530|ref|NC_002936.3| Dehalococcoides ethenogenes 195, complete genome gi|46562128|ref|NC_002937.3| Desulfovibrio vulgaris str. Hildenborough chromosome, complete genome gi|400756305|ref|NC_002939.5| Geobacter sulfurreducens PCA chromosome, complete genome gi|33151282|ref|NC_002940.2| Haemophilus ducreyi 35000HP chromosome, complete genome gi|52840256|ref|NC_002942.5| Legionella pneumophila subsp. pneumophila str. Philadelphia 1 chromosome, complete genome gi|41406098|ref|NC_002944.2| Mycobacterium avium subsp. paratuberculosis K-10, complete genome gi|31791177|ref|NC_002945.3| Mycobacterium bovis AF2122/97 chromosome, complete genome gi|59800473|ref|NC_002946.2| Neisseria gonorrhoeae FA 1090 chromosome, complete genome gi|26986745|ref|NC_002947.3| Pseudomonas putida KT2440 chromosome, complete genome gi|34539880|ref|NC_002950.2| Porphyromonas gingivalis W83 chromosome, complete genome gi|57650036|ref|NC_002951.2| Staphylococcus aureus subsp. aureus COL chromosome, complete genome gi|49482253|ref|NC_002952.2| Staphylococcus aureus subsp. aureus MRSA252 chromosome, complete genome gi|49484912|ref|NC_002953.3| Staphylococcus aureus subsp. aureus MSSA476 chromosome, complete genome gi|42516522|ref|NC_002967.9| Treponema denticola ATCC 35405 chromosome, complete genome gi|77358712|ref|NC_002971.3| Coxiella burnetii RSA 493 chromosome, complete genome gi|85700163|ref|NC_002973.6| Listeria monocytogenes serotype 4b str. F2365 chromosome, complete genome gi|57865352|ref|NC_002976.3| Staphylococcus epidermidis RP62A, complete genome gi|77128441|ref|NC_002977.6| Methylococcus capsulatus str. Bath chromosome, complete genome gi|42519920|ref|NC_002978.6| Wolbachia endosymbiont of Drosophila melanogaster, complete genome gi|194172857|ref|NC_003028.3| Streptococcus pneumoniae TIGR4 chromosome, complete genome gi|15893298|ref|NC_003030.1| Clostridium acetobutylicum ATCC 824 chromosome, complete genome gi|16262453|ref|NC_003037.1| Sinorhizobium meliloti 1021 plasmid pSymA, complete sequence gi|15081479|ref|NC_003042.1| Clostridium perfringens str. 13 plasmid pCP13, complete sequence gi|15963753|ref|NC_003047.1| Sinorhizobium meliloti 1021 chromosome, complete genome gi|159184118|ref|NC_003062.2| Agrobacterium fabrum str. C58 chromosome circular, complete sequence gi|159185562|ref|NC_003063.2| Agrobacterium fabrum str. C58 chromosome linear, complete sequence gi|159186452|ref|NC_003064.2| Agrobacterium fabrum str. C58 plasmid At, complete sequence gi|159161952|ref|NC_003065.3| Agrobacterium fabrum str. C58 plasmid Ti, complete sequence gi|16263748|ref|NC_003078.1| Sinorhizobium meliloti 1021 plasmid pSymB, complete sequence gi|15320554|ref|NC_003080.1| Corynebacterium jeikeium K411 plasmid pKW4, complete sequence gi|15902044|ref|NC_003098.1| Streptococcus pneumoniae R6, complete genome gi|15891923|ref|NC_003103.1| Rickettsia conorii str. Malish 7, complete genome gi|24473558|ref|NC_003106.2| Sulfolobus tokodaii str. 7 chromosome, complete genome gi|77358697|ref|NC_003112.2| Neisseria meningitidis MC58 chromosome, complete genome gi|15793034|ref|NC_003116.1| Neisseria meningitidis Z2491 chromosome, complete genome gi|16082691|ref|NC_003131.1| Yersinia pestis CO92 plasmid pCD1, complete sequence gi|16082679|ref|NC_003132.1| Yersinia pestis CO92 plasmid pPCP1, complete sequence gi|16082781|ref|NC_003134.1| Yersinia pestis CO92 plasmid pMT1, complete sequence gi|16119200|ref|NC_003140.1| Staphylococcus aureus subsp. aureus N315 plasmid pN315, complete sequence gi|16120353|ref|NC_003143.1| Yersinia pestis CO92 chromosome, complete genome gi|162960844|ref|NC_003155.4| Streptomyces avermitilis MA-4680, complete genome gi|16763390|ref|NC_003197.1| Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 chromosome, complete genome gi|16758993|ref|NC_003198.1| Salmonella enterica subsp. enterica serovar Typhi str. CT18, complete chromosome gi|16802048|ref|NC_003210.1| Listeria monocytogenes EGD-e chromosome, complete genome gi|16799079|ref|NC_003212.1| Listeria innocua Clip11262, complete genome gi|60679597|ref|NC_003228.3| Bacteroides fragilis NCTC 9343 chromosome, complete genome gi|17158637|ref|NC_003240.1| Nostoc sp. PCC 7120 plasmid pCC7120beta, complete sequence gi|17158061|ref|NC_003241.1| Nostoc sp. PCC 7120 plasmid pCC7120zeta, complete sequence gi|17227374|ref|NC_003267.1| Nostoc sp. PCC 7120 plasmid pCC7120gamma, complete sequence gi|17227465|ref|NC_003270.1| Nostoc sp. PCC 7120 plasmid pCC7120epsilon, complete sequence gi|17227497|ref|NC_003272.1| Nostoc sp. PCC 7120 chromosome, complete genome gi|17232874|ref|NC_003273.1| Nostoc sp. PCC 7120 plasmid pCC7120delta, complete sequence gi|17233017|ref|NC_003276.1| Nostoc sp. PCC 7120 plasmid pCC7120alpha, complete sequence gi|17233403|ref|NC_003277.1| Salmonella enterica subsp. enterica serovar Typhimurium str. LT2 plasmid pSLT, complete sequence gi|17544719|ref|NC_003295.1| Ralstonia solanacearum GMI1000 chromosome, complete genome gi|17548221|ref|NC_003296.1| Ralstonia solanacearum GMI1000 plasmid pGMI1000MP, complete sequence gi|17986284|ref|NC_003317.1| Brucella melitensis bv. 1 str. 16M chromosome I, complete sequence gi|17988344|ref|NC_003318.1| Brucella melitensis bv. 1 str. 16M chromosome II, complete sequence gi|29839769|ref|NC_003361.3| Chlamydophila caviae GPIC chromosome, complete genome gi|18311643|ref|NC_003364.1| Pyrobaculum aerophilum str. IM2 chromosome, complete genome gi|18308982|ref|NC_003366.1| Clostridium perfringens str. 13 chromosome, complete genome gi|18450286|ref|NC_003383.1| Listeria innocua Clip11262 plasmid pLI100, complete sequence gi|18466424|ref|NC_003384.1| Salmonella enterica subsp. enterica serovar Typhi str. CT18 plasmid pHCM1, complete sequence gi|18466665|ref|NC_003385.1| Salmonella enterica subsp. enterica serovar Typhi str. CT18 plasmid pHCM2, complete sequence gi|18976372|ref|NC_003413.1| Pyrococcus furiosus DSM 3638 chromosome, complete genome gi|19225058|ref|NC_003425.1| Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis plasmid pWb1, complete sequence gi|58036263|ref|NC_003450.3| Corynebacterium glutamicum ATCC 13032, complete genome gi|19703352|ref|NC_003454.1| Fusobacterium nucleatum subsp. nucleatum ATCC 25586 chromosome, complete genome gi|19745201|ref|NC_003485.1| Streptococcus pyogenes MGAS8232 chromosome, complete genome gi|20093440|ref|NC_003551.1| Methanopyrus kandleri AV19, complete genome gi|20088899|ref|NC_003552.1| Methanosarcina acetivorans C2A chromosome, complete genome gi|20806542|ref|NC_003869.1| Thermoanaerobacter tengcongensis MB4 chromosome, complete genome gi|32141095|ref|NC_003888.3| Streptomyces coelicolor A3(2) chromosome, complete genome gi|21226102|ref|NC_003901.1| Methanosarcina mazei Go1 chromosome, complete genome gi|21229478|ref|NC_003902.1| Xanthomonas campestris pv. campestris str. ATCC 33913 chromosome, complete genome gi|21233999|ref|NC_003903.1| Streptomyces coelicolor A3(2) plasmid SCP1, complete sequence gi|21233964|ref|NC_003904.1| Streptomyces coelicolor A3(2) plasmid SCP2, complete sequence gi|42779081|ref|NC_003909.8| Bacillus cereus ATCC 10987, complete genome gi|71277742|ref|NC_003910.7| Colwellia psychrerythraea 34H chromosome, complete genome gi|56694928|ref|NC_003911.11| Ruegeria pomeroyi DSS-3 chromosome, complete genome gi|57236892|ref|NC_003912.7| Campylobacter jejuni RM1221, complete genome gi|21240774|ref|NC_003919.1| Xanthomonas axonopodis pv. citri str. 306 chromosome, complete genome gi|58033143|ref|NC_003921.3| Xanthomonas axonopodis pv. citri str. 306 plasmid pXAC33, complete sequence gi|21264228|ref|NC_003922.1| Xanthomonas axonopodis pv. citri str. 306 plasmid pXAC64, complete sequence gi|21281729|ref|NC_003923.1| Staphylococcus aureus subsp. aureus MW2, complete genome gi|21392688|ref|NC_003980.1| Bacillus anthracis str. A2012 plasmid pXO1, complete sequence gi|21392893|ref|NC_003981.1| Bacillus anthracis str. A2012 plasmid pXO2, complete sequence gi|30260195|ref|NC_003997.3| Bacillus anthracis str. Ames chromosome, complete genome gi|89255298|ref|NC_004041.2| Rhizobium etli CFN 42 symbiotic plasmid p42d, complete sequence gi|21672294|ref|NC_004061.1| Buchnera aphidicola str. Sg (Schizaphis graminum) chromosome, complete genome gi|21909536|ref|NC_004070.1| Streptococcus pyogenes MGAS315 chromosome, complete genome gi|22123922|ref|NC_004088.1| Yersinia pestis KIM 10 chromosome, complete genome gi|22297544|ref|NC_004113.1| Thermosynechococcus elongatus BP-1 chromosome, complete genome gi|22536185|ref|NC_004116.1| Streptococcus agalactiae 2603V/R chromosome, complete genome gi|70728250|ref|NC_004129.6| Pseudomonas protegens Pf-5 chromosome, complete genome gi|23097455|ref|NC_004193.1| Oceanobacillus iheyensis HTE831 chromosome, complete genome gi|23307853|ref|NC_004252.1| Bifidobacterium longum DJO10A plasmid pDOJH10L, complete sequence gi|23307864|ref|NC_004253.1| Bifidobacterium longum DJO10A plasmid pDOJH10S, complete sequence gi|58036264|ref|NC_004307.2| Bifidobacterium longum NCC2705 chromosome, complete genome gi|56968325|ref|NC_004310.3| Brucella suis 1330 chromosome I, complete sequence gi|56968493|ref|NC_004311.2| Brucella suis 1330 chromosome II, complete sequence gi|23577970|ref|NC_004319.1| Corynebacterium efficiens YS-314 plasmid pCE2, complete sequence gi|23577986|ref|NC_004320.1| Corynebacterium efficiens YS-314 plasmid pCE3, complete sequence gi|344915202|ref|NC_004337.2| Shigella flexneri 2a str. 301 chromosome, complete genome gi|294827553|ref|NC_004342.2| Leptospira interrogans serovar Lai str. 56601 chromosome I, complete sequence gi|294653513|ref|NC_004343.2| Leptospira interrogans serovar Lai str. 56601 chromosome II, complete sequence gi|32490749|ref|NC_004344.2| Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis chromosome, complete genome gi|414561716|ref|NC_004347.2| Shewanella oneidensis MR-1 chromosome, complete genome gi|24376231|ref|NC_004349.1| Shewanella oneidensis MR-1 plasmid megaplasmid, complete sequence gi|347750429|ref|NC_004350.2| Streptococcus mutans UA159 chromosome, complete genome gi|25010075|ref|NC_004368.1| Streptococcus agalactiae NEM316, complete genome gi|25026556|ref|NC_004369.1| Corynebacterium efficiens YS-314 chromosome, complete genome gi|26245917|ref|NC_004431.1| Escherichia coli CFT073 chromosome, complete genome gi|26553452|ref|NC_004432.1| Mycoplasma penetrans HF-2, complete genome gi|326423644|ref|NC_004459.3| Vibrio vulnificus CMCP6 chromosome I, complete sequence gi|326424156|ref|NC_004460.2| Vibrio vulnificus CMCP6 chromosome II, complete sequence gi|27466918|ref|NC_004461.1| Staphylococcus epidermidis ATCC 12228 chromosome, complete genome gi|27375111|ref|NC_004463.1| Bradyrhizobium japonicum USDA 110 chromosome, complete genome gi|27904513|ref|NC_004545.1| Buchnera aphidicola str. Bp (Baizongia pistaciae) chromosome, complete genome gi|50118965|ref|NC_004547.2| Pectobacterium atrosepticum SCRI1043 chromosome, complete genome gi|28572175|ref|NC_004551.1| Tropheryma whipplei TW08/27, complete genome gi|62184647|ref|NC_004552.2| Chlamydophila abortus S26/3, complete genome gi|28191370|ref|NC_004554.1| Xylella fastidiosa Temecula1 plasmid pXFPD1.3, complete sequence gi|28191365|ref|NC_004555.1| Buchnera aphidicola str. Bp (Baizongia pistaciae) plasmid pBBp1, complete sequence gi|28197945|ref|NC_004556.1| Xylella fastidiosa Temecula1 chromosome, complete genome gi|28209834|ref|NC_004557.1| Clostridium tetani E88 chromosome, complete genome gi|28373131|ref|NC_004565.1| Clostridium tetani E88 plasmid pE88, complete sequence gi|380031102|ref|NC_004567.2| Lactobacillus plantarum WCFS1, complete genome gi|32447382|ref|NC_004572.3| Tropheryma whipplei str. Twist, complete genome gi|28867243|ref|NC_004578.1| Pseudomonas syringae pv. tomato str. DC3000 chromosome, complete genome gi|28896774|ref|NC_004603.1| Vibrio parahaemolyticus RIMD 2210633 chromosome 1, complete sequence gi|294663009|ref|NC_004604.2| Bacillus megaterium QM B1551 plasmid pBM400, complete sequence gi|28899855|ref|NC_004605.1| Vibrio parahaemolyticus RIMD 2210633 chromosome 2, complete sequence gi|28894912|ref|NC_004606.1| Streptococcus pyogenes SSI-1 chromosome, complete genome gi|29140543|ref|NC_004631.1| Salmonella enterica subsp. enterica serovar Typhi str. Ty2 chromosome, complete genome gi|29171546|ref|NC_004632.1| Pseudomonas syringae pv. tomato str. DC3000 plasmid pDC3000B, complete sequence gi|29171478|ref|NC_004633.1| Pseudomonas syringae pv. tomato str. DC3000 plasmid pDC3000A, complete sequence gi|29345410|ref|NC_004663.1| Bacteroides thetaiotaomicron VPI-5482 chromosome, complete genome gi|29374661|ref|NC_004668.1| Enterococcus faecalis V583 chromosome, complete genome gi|29377803|ref|NC_004669.1| Enterococcus faecalis V583 plasmid pTEF1, complete sequence gi|29377876|ref|NC_004670.1| Enterococcus faecalis V583 plasmid pTEF3, complete sequence gi|29377895|ref|NC_004671.1| Enterococcus faecalis V583 plasmid pTEF2, complete sequence gi|29611500|ref|NC_004703.1| Bacteroides thetaiotaomicron VPI-5482 plasmid p5482, complete sequence gi|29648114|ref|NC_004704.1| Coxiella burnetii RSA 493 plasmid pQpH1, complete sequence gi|29826443|ref|NC_004719.1| Streptomyces avermitilis MA-4680 plasmid SAP1, complete sequence gi|29839220|ref|NC_004720.1| Chlamydophila caviae GPIC plasmid pCpGP1, complete sequence gi|56973315|ref|NC_004721.2| Bacillus cereus ATCC 14579 plasmid pBClin15, complete sequence gi|30018278|ref|NC_004722.1| Bacillus cereus ATCC 14579, complete genome gi|30061571|ref|NC_004741.1| Shigella flexneri 2a str. 2457T, complete genome gi|30248031|ref|NC_004757.1| Nitrosomonas europaea ATCC 19718 chromosome, complete genome gi|294660180|ref|NC_004829.2| Mycoplasma gallisepticum str. R(low) chromosome, complete genome gi|31795333|ref|NC_004838.1| Yersinia pestis KIM plasmid pMT-1, complete sequence gi|56416370|ref|NC_004842.2| Anaplasma marginale str. St. Maries chromosome, complete genome gi|31983523|ref|NC_004851.1| Shigella flexneri 2a str. 301 plasmid pCP301, complete sequence gi|32265499|ref|NC_004917.1| Helicobacter hepaticus ATCC 51449 chromosome, complete genome gi|32453760|ref|NC_004923.1| Aeromonas salmonicida salmonicida A449 plasmid pAsa1, complete sequence gi|32453769|ref|NC_004924.1| Aeromonas salmonicida salmonicida A449 plasmid pAsa3, complete sequence gi|32453780|ref|NC_004925.1| Aeromonas salmonicida salmonicida A449 plasmid pAsa2, complete sequence gi|32456045|ref|NC_004943.1| Bifidobacterium longum NCC2705 plasmid pBLO1, complete sequence gi|32470520|ref|NC_005003.1| Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-06, complete sequence gi|32470532|ref|NC_005004.1| Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-05, complete sequence gi|32470555|ref|NC_005005.1| Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-04, complete sequence gi|32470572|ref|NC_005006.1| Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-03, complete sequence gi|32470581|ref|NC_005007.1| Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-02, complete sequence gi|32470588|ref|NC_005008.1| Staphylococcus epidermidis ATCC 12228 plasmid pSE-12228-01, complete sequence gi|32470666|ref|NC_005027.1| Rhodopirellula baltica SH 1 chromosome, complete genome gi|33239452|ref|NC_005042.1| Prochlorococcus marinus subsp. marinus str. CCMP1375 chromosome, complete genome gi|33241335|ref|NC_005043.1| Chlamydophila pneumoniae TW-183, complete genome gi|33519483|ref|NC_005061.1| Candidatus Blochmannia floridanus chromosome, complete genome gi|33864539|ref|NC_005070.1| Synechococcus sp. WH 8102, complete genome gi|33862273|ref|NC_005071.1| Prochlorococcus marinus str. MIT 9313 chromosome, complete genome gi|33860560|ref|NC_005072.1| Prochlorococcus marinus subsp. pastoris str. CCMP1986 chromosome, complete genome gi|34495455|ref|NC_005085.1| Chromobacterium violaceum ATCC 12472 chromosome, complete genome gi|34556458|ref|NC_005090.1| Wolinella succinogenes DSM 1740 chromosome, complete genome gi|37519569|ref|NC_005125.1| Gloeobacter violaceus PCC 7421 chromosome, complete genome gi|37524032|ref|NC_005126.1| Photorhabdus luminescens subsp. laumondii TTO1, complete genome gi|37595821|ref|NC_005128.1| Vibrio vulnificus YJ016 plasmid pYJ016, complete sequence gi|37678184|ref|NC_005139.1| Vibrio vulnificus YJ016 chromosome I, complete sequence gi|37675660|ref|NC_005140.1| Vibrio vulnificus YJ016 chromosome II, complete sequence gi|38349555|ref|NC_005213.1| Nanoarchaeum equitans Kin4-M chromosome, complete genome gi|38505535|ref|NC_005229.1| Synechocystis sp. PCC 6803 plasmid pSYSM, complete sequence gi|38505668|ref|NC_005230.1| Synechocystis sp. PCC 6803 plasmid pSYSA, complete sequence gi|38505775|ref|NC_005231.1| Synechocystis sp. PCC 6803 plasmid pSYSG, complete sequence gi|38505825|ref|NC_005232.1| Synechocystis sp. PCC 6803 plasmid pSYSX, complete sequence gi|38637668|ref|NC_005241.1| Ralstonia eutropha H16 megaplasmid pHG1, complete sequence gi|42632299|ref|NC_005244.2| Pseudomonas sp. ND6 plasmid pND6-1, complete sequence gi|57238731|ref|NC_005295.2| Ehrlichia ruminantium str. Welgevonden chromosome, complete genome gi|39933080|ref|NC_005296.1| Rhodopseudomonas palustris CGA009 chromosome, complete genome gi|39840937|ref|NC_005297.1| Rhodopseudomonas palustris CGA009 plasmid pRPA, complete sequence gi|255961248|ref|NC_005303.2| Onion yellows phytoplasma OY-M, complete genome gi|42518084|ref|NC_005362.1| Lactobacillus johnsonii NCC 533, complete genome gi|42521650|ref|NC_005363.1| Bdellovibrio bacteriovorus HD100, complete genome gi|127763381|ref|NC_005364.2| Mycoplasma mycoides subsp. mycoides SC str. PG1 chromosome, complete genome gi|44004339|ref|NC_005707.1| Bacillus cereus ATCC 10987 plasmid pBc10987, complete sequence gi|71733195|ref|NC_005773.3| Pseudomonas syringae pv. phaseolicola 1448A chromosome, complete genome gi|45357563|ref|NC_005791.1| Methanococcus maripaludis S2 chromosome, complete genome gi|45439865|ref|NC_005810.1| Yersinia pestis biovar Microtus str. 91001 chromosome, complete genome gi|45478502|ref|NC_005813.1| Yersinia pestis biovar Microtus str. 91001 plasmid pCD1, complete sequence gi|45476499|ref|NC_005814.1| Yersinia pestis biovar Microtus str. 91001 plasmid pCRY, complete sequence gi|45478588|ref|NC_005815.1| Yersinia pestis biovar Microtus str. 91001 plasmid pMT1, complete sequence gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence gi|45655914|ref|NC_005823.1| Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130 chromosome I, complete sequence gi|45655585|ref|NC_005824.1| Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130 chromosome II, complete sequence gi|46198308|ref|NC_005835.1| Thermus thermophilus HB27, complete genome gi|46255071|ref|NC_005838.1| Thermus thermophilus HB27 plasmid pTT27, complete sequence gi|46445634|ref|NC_005861.1| Candidatus Protochlamydia amoebophila UWE25 chromosome, complete genome gi|46562129|ref|NC_005863.1| Desulfovibrio vulgaris str. Hildenborough plasmid pDV, complete sequence gi|47104025|ref|NC_005871.1| Photobacterium profundum SS9 plasmid pPBPR1, complete sequence gi|48477072|ref|NC_005877.1| Picrophilus torridus DSM 9790 chromosome, complete genome gi|49146084|ref|NC_005916.1| Mycobacterium ulcerans AGY99 plasmid pMUM001, complete sequence gi|49183039|ref|NC_005945.1| Bacillus anthracis str. Sterne chromosome, complete genome gi|49398098|ref|NC_005951.1| Staphylococcus aureus subsp. aureus MSSA476 plasmid pSAS, complete sequence gi|49473688|ref|NC_005955.1| Bartonella quintana str. Toulouse, complete genome gi|49474831|ref|NC_005956.1| Bartonella henselae str. Houston-1 chromosome, complete genome gi|49476684|ref|NC_005957.1| Bacillus thuringiensis serovar konkukian str. 97-27 chromosome, complete genome gi|50083297|ref|NC_005966.1| Acinetobacter sp. ADP1 chromosome, complete genome gi|50364815|ref|NC_006055.1| Mesoplasma florum L1 chromosome, complete genome gi|50841496|ref|NC_006085.1| Propionibacterium acnes KPA171202 chromosome, complete genome gi|50913346|ref|NC_006086.1| Streptococcus pyogenes MGAS10394 chromosome, complete genome gi|50953925|ref|NC_006087.1| Leifsonia xyli subsp. xyli str. CTCB07 chromosome, complete genome gi|51038597|ref|NC_006128.1| Borrelia garinii PBi plasmid cp26, complete sequence gi|51038624|ref|NC_006129.1| Borrelia garinii PBi plasmid lp54, complete sequence gi|51243852|ref|NC_006138.1| Desulfotalea psychrophila LSv54, complete genome gi|51246971|ref|NC_006139.1| Desulfotalea psychrophila LSv54 plasmid large, complete sequence gi|51247073|ref|NC_006140.1| Desulfotalea psychrophila LSv54 plasmid small, complete sequence gi|51473215|ref|NC_006142.1| Rickettsia typhi str. Wilmington, complete genome gi|113911685|ref|NC_006153.2| Yersinia pseudotuberculosis IP 32953 plasmid pYV, complete sequence gi|51593942|ref|NC_006154.1| Yersinia pseudotuberculosis IP 32953 plasmid pYptb32953, complete sequence gi|51594359|ref|NC_006155.1| Yersinia pseudotuberculosis IP 32953 chromosome, complete genome gi|51598263|ref|NC_006156.1| Borrelia garinii PBi chromosome linear, complete sequence gi|51891138|ref|NC_006177.1| Symbiobacterium thermophilum IAM 14863 chromosome, complete genome gi|163119169|ref|NC_006270.3| Bacillus licheniformis ATCC 14580 chromosome, complete genome gi|52140164|ref|NC_006274.1| Bacillus cereus E33L chromosome, complete genome gi|52421214|ref|NC_006297.1| Bacteroides fragilis YCH46 plasmid pBFY46, complete sequence gi|52421262|ref|NC_006298.1| Haemophilus somnus 129PT plasmid pHS129, complete sequence gi|52424055|ref|NC_006300.1| Mannheimia succiniciproducens MBEL55E chromosome, complete genome gi|52783855|ref|NC_006322.1| Bacillus licheniformis DSM 13 = ATCC 14580 chromosome, complete genome gi|53711291|ref|NC_006347.1| Bacteroides fragilis YCH46 chromosome, complete genome gi|53723370|ref|NC_006348.1| Burkholderia mallei ATCC 23344 chromosome 1, complete sequence gi|77358719|ref|NC_006349.2| Burkholderia mallei ATCC 23344 chromosome 2, complete sequence gi|53717639|ref|NC_006350.1| Burkholderia pseudomallei K96243 chromosome 1, complete sequence gi|53721039|ref|NC_006351.1| Burkholderia pseudomallei K96243 chromosome 2, complete sequence gi|54019969|ref|NC_006360.1| Mycoplasma hyopneumoniae 232 chromosome, complete genome gi|54021964|ref|NC_006361.1| Nocardia farcinica IFM 10152 chromosome, complete genome gi|54027648|ref|NC_006362.1| Nocardia farcinica IFM 10152 plasmid pNF1, complete sequence gi|54027809|ref|NC_006363.1| Nocardia farcinica IFM 10152 plasmid pNF2, complete sequence gi|54295843|ref|NC_006365.1| Legionella pneumophila str. Paris plasmid pLPP, complete sequence gi|54292907|ref|NC_006366.1| Legionella pneumophila str. Lens plasmid pLPL, complete sequence gi|54295983|ref|NC_006368.1| Legionella pneumophila str. Paris, complete genome gi|54292964|ref|NC_006369.1| Legionella pneumophila str. Lens, complete genome gi|54307237|ref|NC_006370.1| Photobacterium profundum SS9 chromosome 1, complete genome gi|54301680|ref|NC_006371.1| Photobacterium profundum SS9 chromosome 2, complete sequence gi|54307144|ref|NC_006373.1| Bacteroides uniformis mobilizable transposon NBU1, complete sequence gi|54307228|ref|NC_006375.1| Lactobacillus plantarum WCFS1 plasmid pWCFS101, complete sequence gi|54307232|ref|NC_006376.1| Lactobacillus plantarum WCFS1 plasmid pWCFS102, complete sequence gi|54307184|ref|NC_006377.1| Lactobacillus plantarum WCFS1 plasmid pWCFS103, complete sequence gi|55376107|ref|NC_006389.1| Haloarcula marismortui ATCC 43049 plasmid pNG100, complete sequence gi|55376144|ref|NC_006390.1| Haloarcula marismortui ATCC 43049 plasmid pNG200, complete sequence gi|55376187|ref|NC_006391.1| Haloarcula marismortui ATCC 43049 plasmid pNG300, complete sequence gi|55376228|ref|NC_006392.1| Haloarcula marismortui ATCC 43049 plasmid pNG400, complete sequence gi|55376280|ref|NC_006393.1| Haloarcula marismortui ATCC 43049 plasmid pNG500, complete sequence gi|55376412|ref|NC_006394.1| Haloarcula marismortui ATCC 43049 plasmid pNG600, complete sequence gi|55376579|ref|NC_006395.1| Haloarcula marismortui ATCC 43049 plasmid pNG700, complete sequence gi|55376942|ref|NC_006396.1| Haloarcula marismortui ATCC 43049 chromosome I, complete sequence gi|55380074|ref|NC_006397.1| Haloarcula marismortui ATCC 43049 chromosome II, complete sequence gi|55820103|ref|NC_006448.1| Streptococcus thermophilus LMG 18311 chromosome, complete genome gi|55821993|ref|NC_006449.1| Streptococcus thermophilus CNRZ1066 chromosome, complete genome gi|55979969|ref|NC_006461.1| Thermus thermophilus HB8 chromosome, complete genome gi|55978183|ref|NC_006462.1| Thermus thermophilus HB8 plasmid pTT27, complete sequence gi|55978435|ref|NC_006463.1| Thermus thermophilus HB8 plasmid pTT8, complete sequence gi|56410437|ref|NC_006509.1| Geobacillus kaustophilus HTA426 plasmid pHTA426, complete sequence gi|56418535|ref|NC_006510.1| Geobacillus kaustophilus HTA426 chromosome, complete genome gi|56412276|ref|NC_006511.1| Salmonella enterica subsp. enterica serovar Paratyphi A str. ATCC 9150 chromosome, complete genome gi|56459112|ref|NC_006512.1| Idiomarina loihiensis L2TR chromosome, complete genome gi|56475432|ref|NC_006513.1| Aromatoleum aromaticum EbN1 chromosome, complete genome gi|283856168|ref|NC_006526.2| Zymomonas mobilis subsp. mobilis ZM4 chromosome, complete genome gi|56707107|ref|NC_006529.1| Lactobacillus salivarius UCC118 plasmid pSF118-20, complete sequence gi|56707135|ref|NC_006530.1| Lactobacillus salivarius UCC118 plasmid pSF118-44, complete sequence gi|56708791|ref|NC_006569.1| Ruegeria pomeroyi DSS-3 megaplasmid, complete sequence gi|255961454|ref|NC_006570.2| Francisella tularensis subsp. tularensis SCHU S4 chromosome, complete genome gi|56750010|ref|NC_006576.1| Synechococcus elongatus PCC 6301 chromosome, complete genome gi|56899872|ref|NC_006578.1| Bacillus thuringiensis serovar konkukian str. 97-27 plasmid pBT9727, complete sequence gi|56961782|ref|NC_006582.1| Bacillus clausii KSM-K16, complete genome gi|57639935|ref|NC_006624.1| Thermococcus kodakarensis KOD1 chromosome, complete genome gi|57639934|ref|NC_006625.1| Klebsiella pneumoniae NTUH-K2044 plasmid pK2044, complete sequence gi|77102894|ref|NC_006629.2| Staphylococcus aureus subsp. aureus COL plasmid pT181, complete sequence gi|57854744|ref|NC_006663.1| Staphylococcus epidermidis RP62A plasmid pSERP, complete sequence gi|58038254|ref|NC_006672.1| Gluconobacter oxydans 621H plasmid pGOX1, complete sequence gi|58038418|ref|NC_006673.1| Gluconobacter oxydans 621H plasmid pGOX2, complete sequence gi|58038448|ref|NC_006674.1| Gluconobacter oxydans 621H plasmid pGOX3, complete sequence gi|58038467|ref|NC_006675.1| Gluconobacter oxydans 621H plasmid pGOX4, complete sequence gi|58038486|ref|NC_006676.1| Gluconobacter oxydans 621H plasmid pGOX5, complete sequence gi|58038491|ref|NC_006677.1| Gluconobacter oxydans 621H chromosome, complete genome gi|159162017|ref|NC_006814.3| Lactobacillus acidophilus NCFM chromosome, complete genome gi|58616149|ref|NC_006823.1| Azoarcus sp. EbN1 plasmid 1, complete sequence gi|58616422|ref|NC_006824.1| Azoarcus sp. EbN1 plasmid 2, complete sequence gi|58616727|ref|NC_006831.1| Ehrlichia ruminantium str. Gardel, complete genome gi|58578664|ref|NC_006832.1| Ehrlichia ruminantium str. Welgevonden, complete genome gi|58584261|ref|NC_006833.1| Wolbachia endosymbiont strain TRS of Brugia malayi, complete genome gi|58579623|ref|NC_006834.1| Xanthomonas oryzae pv. oryzae KACC 10331 chromosome, complete genome gi|172087630|ref|NC_006840.2| Vibrio fischeri ES114 chromosome I, complete sequence gi|172087787|ref|NC_006841.2| Vibrio fischeri ES114 chromosome II, complete sequence gi|59714356|ref|NC_006842.1| Vibrio fischeri ES114 plasmid pES100, complete sequence gi|60115462|ref|NC_006855.1| Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 plasmid pSCV50, complete sequence gi|60115514|ref|NC_006856.1| Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 plasmid pSC138, complete sequence gi|60650141|ref|NC_006873.1| Bacteroides fragilis NCTC 9343 plasmid pBF9343, complete sequence gi|62178570|ref|NC_006905.1| Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67 chromosome, complete genome gi|47458835|ref|NC_006908.1| Mycoplasma mobile 163K, complete genome gi|62288991|ref|NC_006932.1| Brucella abortus bv. 1 str. 9-941 chromosome I, complete sequence gi|62316961|ref|NC_006933.1| Brucella abortus bv. 1 str. 9-941 chromosome II, complete sequence gi|62388892|ref|NC_006958.1| Corynebacterium glutamicum ATCC 13032, complete genome gi|226315872|ref|NC_006969.2| Rhodococcus opacus B4 plasmid pKNR01, complete sequence gi|226316799|ref|NC_006970.2| Rhodococcus opacus B4 plasmid pKNR02, complete sequence gi|66043271|ref|NC_007005.1| Pseudomonas syringae pv. syringae B728a chromosome, complete genome gi|66766352|ref|NC_007086.1| Xanthomonas campestris pv. campestris str. 8004 chromosome, complete genome gi|67077889|ref|NC_007103.1| Bacillus cereus E33L plasmid pE33L466, complete sequence gi|67078320|ref|NC_007104.1| Bacillus cereus E33L plasmid pE33L5, complete sequence gi|67078326|ref|NC_007105.1| Bacillus cereus E33L plasmid pE33L54, complete sequence gi|67078381|ref|NC_007106.1| Bacillus cereus E33L plasmid pE33L8, complete sequence gi|67078390|ref|NC_007107.1| Bacillus cereus E33L plasmid pE33L9, complete sequence gi|67458392|ref|NC_007109.1| Rickettsia felis URRWXCal2 chromosome, complete genome gi|67459793|ref|NC_007110.1| Rickettsia felis URRWXCal2 plasmid pRF, complete sequence gi|67459862|ref|NC_007111.1| Rickettsia felis URRWXCal2 plasmid pRFdelta, complete sequence gi|162960935|ref|NC_007146.2| Haemophilus influenzae 86-028NP chromosome, complete genome gi|68535062|ref|NC_007164.1| Corynebacterium jeikeium K411 chromosome, complete genome gi|70725001|ref|NC_007168.1| Staphylococcus haemolyticus JCSC1435 chromosome, complete genome gi|68535043|ref|NC_007169.1| Staphylococcus haemolyticus JCSC1435 plasmid pSHaeA, complete sequence gi|68535047|ref|NC_007170.1| Staphylococcus haemolyticus JCSC1435 plasmid pSHaeB, complete sequence gi|68535050|ref|NC_007171.1| Staphylococcus haemolyticus JCSC1435 plasmid pSHaeC, complete sequence gi|70605853|ref|NC_007181.1| Sulfolobus acidocaldarius DSM 639 chromosome, complete genome gi|71064581|ref|NC_007204.1| Psychrobacter arcticus 273-4 chromosome, complete genome gi|71082709|ref|NC_007205.1| Candidatus Pelagibacter ubique HTCC1062 chromosome, complete genome gi|71725141|ref|NC_007274.1| Pseudomonas syringae pv. phaseolicola 1448A large plasmid, complete sequence gi|71725269|ref|NC_007275.1| Pseudomonas syringae pv. phaseolicola 1448A small plasmid, complete sequence gi|71891793|ref|NC_007292.1| Candidatus Blochmannia pennsylvanicus str. BPEN chromosome, complete genome gi|71894025|ref|NC_007294.1| Mycoplasma synoviae 53, complete genome gi|71893359|ref|NC_007295.1| Mycoplasma hyopneumoniae J chromosome, complete genome gi|71902667|ref|NC_007296.1| Streptococcus pyogenes MGAS6180 chromosome, complete genome gi|71909814|ref|NC_007297.1| Streptococcus pyogenes MGAS5005 chromosome, complete genome gi|71905642|ref|NC_007298.1| Dechloromonas aromatica RCB, complete genome gi|47566322|ref|NC_007322.2| Bacillus anthracis str. 'Ames Ancestor' plasmid pXO1, complete sequence gi|50163691|ref|NC_007323.3| Bacillus anthracis str. 'Ames Ancestor' plasmid pXO2, complete sequence gi|72080342|ref|NC_007332.1| Mycoplasma hyopneumoniae 7448 chromosome, complete genome gi|72160406|ref|NC_007333.1| Thermobifida fusca YX chromosome, complete genome gi|162958048|ref|NC_007335.2| Prochlorococcus marinus str. NATL2A chromosome, complete genome gi|72383731|ref|NC_007336.1| Ralstonia eutropha JMP134 megaplasmid, complete sequence gi|72384244|ref|NC_007337.1| Ralstonia eutropha JMP134 plasmid 1, complete sequence gi|73539706|ref|NC_007347.1| Ralstonia eutropha JMP134 chromosome 1, complete sequence gi|73537298|ref|NC_007348.1| Ralstonia eutropha JMP134 chromosome 2, complete sequence gi|73663826|ref|NC_007349.1| Methanosarcina barkeri str. fusaro plasmid 1, complete sequence gi|73661309|ref|NC_007350.1| Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305, complete genome gi|73663756|ref|NC_007351.1| Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 plasmid pSSP1, complete sequence gi|73663802|ref|NC_007352.1| Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 plasmid pSSP2, complete sequence gi|73666633|ref|NC_007354.1| Ehrlichia canis str. Jake chromosome, complete genome gi|73667559|ref|NC_007355.1| Methanosarcina barkeri str. Fusaro, complete genome gi|73747956|ref|NC_007356.1| Dehalococcoides sp. CBDB1 chromosome, complete genome gi|74310614|ref|NC_007384.1| Shigella sonnei Ss046 chromosome, complete genome gi|74314838|ref|NC_007385.1| Shigella sonnei Ss046 plasmid pSS_046, complete sequence gi|74316018|ref|NC_007404.1| Thiobacillus denitrificans ATCC 25259 chromosome, complete genome gi|75674199|ref|NC_007406.1| Nitrobacter winogradskyi Nb-255 chromosome, complete genome gi|75812284|ref|NC_007410.1| Anabaena variabilis ATCC 29413 plasmid A, complete sequence gi|75812629|ref|NC_007411.1| Anabaena variabilis ATCC 29413 plasmid B, complete sequence gi|75812661|ref|NC_007412.1| Anabaena variabilis ATCC 29413 plasmid C, complete sequence gi|75906225|ref|NC_007413.1| Anabaena variabilis ATCC 29413 chromosome, complete genome gi|75994447|ref|NC_007414.1| Escherichia coli O157:H7 EDL933 plasmid pO157, complete sequence gi|76800655|ref|NC_007426.1| Natronomonas pharaonis DSM 2160, complete genome gi|76803367|ref|NC_007427.1| Natronomonas pharaonis DSM 2160 plasmid PL131, complete sequence gi|76803317|ref|NC_007428.1| Natronomonas pharaonis DSM 2160 plasmid PL23, complete sequence gi|76788711|ref|NC_007429.1| Chlamydia trachomatis A/HAR-13, complete genome gi|76789623|ref|NC_007430.1| Chlamydia trachomatis A/HAR-13 plasmid pCTA, complete sequence gi|76786714|ref|NC_007432.1| Streptococcus agalactiae A909 chromosome, complete genome gi|76808520|ref|NC_007434.1| Burkholderia pseudomallei 1710b chromosome I, complete sequence gi|76817237|ref|NC_007435.1| Burkholderia pseudomallei 1710b chromosome II, complete sequence gi|77358982|ref|NC_007481.1| Pseudoalteromonas haloplanktis TAC125 chromosome I, complete sequence gi|77361923|ref|NC_007482.1| Pseudoalteromonas haloplanktis TAC125 chromosome II, complete sequence gi|77163517|ref|NC_007483.1| Nitrosococcus oceani ATCC 19707 plasmid A, complete sequence gi|77163561|ref|NC_007484.1| Nitrosococcus oceani ATCC 19707 chromosome, complete genome gi|77404485|ref|NC_007486.1| Rhodococcus erythropolis PR4 plasmid pREC1, complete sequence gi|77404588|ref|NC_007487.1| Rhodococcus erythropolis PR4 plasmid pREC2, complete sequence gi|77404592|ref|NC_007488.1| Rhodobacter sphaeroides 2.4.1 plasmid B, complete sequence gi|77404693|ref|NC_007489.1| Rhodobacter sphaeroides 2.4.1 plasmid C, complete sequence gi|77404776|ref|NC_007490.1| Rhodobacter sphaeroides 2.4.1 plasmid D, complete sequence gi|77454567|ref|NC_007491.1| Rhodococcus erythropolis PR4 plasmid pREL1, complete sequence gi|255961261|ref|NC_007492.2| Pseudomonas fluorescens Pf0-1 chromosome, complete genome gi|77461965|ref|NC_007493.1| Rhodobacter sphaeroides 2.4.1 chromosome 1, complete sequence gi|77464988|ref|NC_007494.1| Rhodobacter sphaeroides 2.4.1 chromosome 2, complete sequence gi|90960985|ref|NC_007498.2| Pelobacter carbinolicus DSM 2380 chromosome, complete genome gi|78042616|ref|NC_007503.1| Carboxydothermus hydrogenoformans Z-2901 chromosome, complete genome gi|78045239|ref|NC_007504.1| Xanthomonas campestris pv. vesicatoria str. 85-10 plasmid pXCV2, complete sequence gi|78045242|ref|NC_007505.1| Xanthomonas campestris pv. vesicatoria str. 85-10 plasmid pXCV19, complete sequence gi|78045265|ref|NC_007506.1| Xanthomonas campestris pv. vesicatoria str. 85-10 plasmid pXCV38, complete sequence gi|78045309|ref|NC_007507.1| Xanthomonas campestris pv. vesicatoria str. 85-10 plasmid pXCV183, complete sequence gi|78045556|ref|NC_007508.1| Xanthomonas campestris pv. vesicatoria str. 85-10 chromosome, complete genome gi|78059643|ref|NC_007509.1| Burkholderia sp. 383 chromosome 3, complete sequence gi|78064658|ref|NC_007510.1| Burkholderia sp. 383 chromosome 1, complete sequence gi|78060853|ref|NC_007511.1| Burkholderia sp. 383 chromosome 2, complete sequence gi|78185892|ref|NC_007512.1| Chlorobium luteolum DSM 273 chromosome, complete genome gi|78183584|ref|NC_007513.1| Synechococcus sp. CC9902 chromosome, complete genome gi|78187984|ref|NC_007514.1| Chlorobium chlorochromatii CaD3 chromosome, complete genome gi|78214253|ref|NC_007515.1| Geobacter metallireducens GS-15 plasmid unnamed, complete sequence gi|78211558|ref|NC_007516.1| Synechococcus sp. CC9605, complete genome gi|78221228|ref|NC_007517.1| Geobacter metallireducens GS-15 chromosome, complete genome gi|78355047|ref|NC_007519.1| Desulfovibrio alaskensis G20 chromosome, complete genome gi|118139508|ref|NC_007520.2| Thiomicrospira crunogena XCL-2 chromosome, complete genome gi|50196905|ref|NC_007530.2| Bacillus anthracis str. 'Ames Ancestor' chromosome, complete genome gi|78776201|ref|NC_007575.1| Sulfurimonas denitrificans DSM 1251 chromosome, complete genome gi|81427616|ref|NC_007576.1| Lactobacillus sakei subsp. sakei 23K chromosome, complete genome gi|78778385|ref|NC_007577.1| Prochlorococcus marinus str. MIT 9312, complete genome gi|81230333|ref|NC_007595.1| Synechococcus elongatus PCC 7942 plasmid 1, complete sequence gi|81298811|ref|NC_007604.1| Synechococcus elongatus PCC 7942 chromosome, complete genome gi|82775382|ref|NC_007606.1| Shigella dysenteriae Sd197, complete genome gi|82524407|ref|NC_007607.1| Shigella dysenteriae Sd197 plasmid pSD1_197, complete sequence gi|82524664|ref|NC_007608.1| Shigella boydii Sb227 plasmid pSB4_227, complete sequence gi|82542618|ref|NC_007613.1| Shigella boydii Sb227 chromosome, complete genome gi|82701135|ref|NC_007614.1| Nitrosospira multiformis ATCC 25196 chromosome, complete genome gi|82703893|ref|NC_007615.1| Nitrosospira multiformis ATCC 25196 plasmid 1, complete sequence gi|82703911|ref|NC_007616.1| Nitrosospira multiformis ATCC 25196 plasmid 2, complete sequence gi|82703928|ref|NC_007617.1| Nitrosospira multiformis ATCC 25196 plasmid 3, complete sequence gi|82698932|ref|NC_007618.1| Brucella melitensis biovar Abortus 2308 chromosome I, complete sequence gi|82749777|ref|NC_007622.1| Staphylococcus aureus RF122, complete genome gi|83268957|ref|NC_007624.1| Brucella melitensis biovar Abortus 2308 chromosome II, complete sequence gi|83309099|ref|NC_007626.1| Magnetospirillum magneticum AMB-1 chromosome, complete genome gi|83319253|ref|NC_007633.1| Mycoplasma capricolum subsp. capricolum ATCC 27343 chromosome, complete genome gi|83582730|ref|NC_007641.1| Rhodospirillum rubrum ATCC 11170 plasmid unnamed, complete sequence gi|83591340|ref|NC_007643.1| Rhodospirillum rubrum ATCC 11170 chromosome, complete genome gi|83588874|ref|NC_007644.1| Moorella thermoacetica ATCC 39073 chromosome, complete genome gi|83642913|ref|NC_007645.1| Hahella chejuensis KCTC 2396 chromosome, complete genome gi|83716035|ref|NC_007650.1| Burkholderia thailandensis E264 chromosome II, complete sequence gi|83718394|ref|NC_007651.1| Burkholderia thailandensis E264 chromosome I, complete sequence gi|83814055|ref|NC_007677.1| Salinibacter ruber DSM 13855 chromosome, complete genome gi|83816857|ref|NC_007678.1| Salinibacter ruber DSM 13855 plasmid pSR35, complete sequence gi|84488831|ref|NC_007681.1| Methanosphaera stadtmanae DSM 3091 chromosome, complete genome gi|84621657|ref|NC_007705.1| Xanthomonas oryzae pv. oryzae MAFF 311018 chromosome, complete genome gi|85057978|ref|NC_007712.1| Sodalis glossinidius str. 'morsitans' chromosome, complete genome gi|85060411|ref|NC_007713.1| Sodalis glossinidius str. 'morsitans' plasmid pSG1, complete sequence gi|85060466|ref|NC_007714.1| Sodalis glossinidius str. 'morsitans' plasmid pSG2, complete sequence gi|85060490|ref|NC_007715.1| Sodalis glossinidius str. 'morsitans' plasmid pSG3, complete sequence gi|85057280|ref|NC_007716.1| Aster yellows witches'-broom phytoplasma AYWB, complete genome gi|85057952|ref|NC_007717.1| Aster yellows witches'-broom phytoplasma AYWB plasmid pAYWB-I, complete sequence gi|85057958|ref|NC_007718.1| Aster yellows witches'-broom phytoplasma AYWB plasmid pAYWB-II, complete sequence gi|85057963|ref|NC_007719.1| Aster yellows witches'-broom phytoplasma AYWB plasmid pAYWB-III, complete sequence gi|85057971|ref|NC_007720.1| Aster yellows witches'-broom phytoplasma AYWB plasmid pAYWB-IV, complete sequence gi|85372828|ref|NC_007722.1| Erythrobacter litoralis HTCC2594 chromosome, complete genome gi|85857845|ref|NC_007759.1| Syntrophus aciditrophicus SB chromosome, complete genome gi|86156430|ref|NC_007760.1| Anaeromyxobacter dehalogenans 2CP-C chromosome, complete genome gi|86355669|ref|NC_007761.1| Rhizobium etli CFN 42 chromosome, complete genome gi|86359705|ref|NC_007762.1| Rhizobium etli CFN 42 plasmid p42a, complete sequence gi|86359881|ref|NC_007763.1| Rhizobium etli CFN 42 plasmid p42b, complete sequence gi|86360045|ref|NC_007764.1| Rhizobium etli CFN 42 plasmid p42c, complete sequence gi|86360278|ref|NC_007765.1| Rhizobium etli CFN 42 plasmid p42e, complete sequence gi|86360734|ref|NC_007766.1| Rhizobium etli CFN 42 plasmid p42f, complete sequence gi|86604733|ref|NC_007775.1| Synechococcus sp. JA-3-3Ab chromosome, complete genome gi|86607503|ref|NC_007776.1| Synechococcus sp. JA-2-3B'a(2-13) chromosome, complete genome gi|86738724|ref|NC_007777.1| Frankia sp. CcI3 chromosome, complete genome gi|86747127|ref|NC_007778.1| Rhodopseudomonas palustris HaA2 chromosome, complete genome gi|388476123|ref|NC_007779.1| Escherichia coli str. K-12 substr. W3110, complete genome gi|87159837|ref|NC_007790.1| Staphylococcus aureus subsp. aureus USA300_FPR3757 plasmid pUSA01, complete sequence gi|87159843|ref|NC_007791.1| Staphylococcus aureus subsp. aureus USA300_FPR3757 plasmid pUSA02, complete sequence gi|87159847|ref|NC_007792.1| Staphylococcus aureus subsp. aureus USA300_FPR3757 plasmid pUSA03, complete sequence gi|87159884|ref|NC_007793.1| Staphylococcus aureus subsp. aureus USA300_FPR3757 chromosome, complete genome gi|87198026|ref|NC_007794.1| Novosphingobium aromaticivorans DSM 12444 chromosome, complete genome gi|88193823|ref|NC_007795.1| Staphylococcus aureus subsp. aureus NCTC 8325 chromosome, complete genome gi|88601322|ref|NC_007796.1| Methanospirillum hungatei JF-1 chromosome, complete genome gi|88606690|ref|NC_007797.1| Anaplasma phagocytophilum HZ, complete genome gi|88607955|ref|NC_007798.1| Neorickettsia sennetsu str. Miyayama chromosome, complete genome gi|88657561|ref|NC_007799.1| Ehrlichia chaffeensis str. Arkansas, complete genome gi|89057699|ref|NC_007801.1| Jannaschia sp. CCS1 plasmid1, complete sequence gi|89052491|ref|NC_007802.1| Jannaschia sp. CCS1 chromosome, complete genome gi|89255449|ref|NC_007880.1| Francisella tularensis subsp. holarctica LVS chromosome, complete genome gi|89897807|ref|NC_007899.1| Chlamydophila felis Fe/C-56, complete genome gi|89898813|ref|NC_007900.1| Chlamydophila felis Fe/C-56 plasmid pCfe1, complete sequence gi|89885732|ref|NC_007901.1| Rhodoferax ferrireducens T118 plasmid1, complete sequence gi|89892746|ref|NC_007907.1| Desulfitobacterium hafniense Y51 chromosome, complete genome gi|89898822|ref|NC_007908.1| Rhodoferax ferrireducens T118 chromosome, complete genome gi|90019649|ref|NC_007912.1| Saccharophagus degradans 2-40 chromosome, complete genome gi|90421528|ref|NC_007925.1| Rhodopseudomonas palustris BisB18 chromosome, complete genome gi|90960990|ref|NC_007929.1| Lactobacillus salivarius UCC118 chromosome, complete genome gi|90962708|ref|NC_007930.1| Lactobacillus salivarius UCC118 plasmid pMP118, complete sequence gi|91204815|ref|NC_007940.1| Rickettsia bellii RML369-C chromosome, complete genome gi|91206245|ref|NC_007941.1| Escherichia coli UTI89 plasmid pUTI89, complete sequence gi|91209055|ref|NC_007946.1| Escherichia coli UTI89 chromosome, complete genome gi|91774356|ref|NC_007947.1| Methylobacillus flagellatus KT, complete genome gi|91785913|ref|NC_007948.1| Polaromonas sp. JS666 chromosome, complete genome gi|91790731|ref|NC_007949.1| Polaromonas sp. JS666 plasmid 1, complete sequence gi|91791058|ref|NC_007950.1| Polaromonas sp. JS666 plasmid 2, complete sequence gi|91781384|ref|NC_007951.1| Burkholderia xenovorans LB400 chromosome 1, complete sequence gi|91777110|ref|NC_007952.1| Burkholderia xenovorans LB400 chromosome 2, complete sequence gi|91780071|ref|NC_007953.1| Burkholderia xenovorans LB400 chromosome 3, complete genome gi|91791369|ref|NC_007954.1| Shewanella denitrificans OS217, complete genome gi|91772082|ref|NC_007955.1| Methanococcoides burtonii DSM 6242, complete genome gi|91974482|ref|NC_007958.1| Rhodopseudomonas palustris BisB5 chromosome, complete genome gi|92109250|ref|NC_007959.1| Nitrobacter hamburgensis X14 plasmid 1, complete sequence gi|92109490|ref|NC_007960.1| Nitrobacter hamburgensis X14 plasmid 2, complete sequence gi|92109663|ref|NC_007961.1| Nitrobacter hamburgensis X14 plasmid 3, complete sequence gi|92112136|ref|NC_007963.1| Chromohalobacter salexigens DSM 3043 chromosome, complete genome gi|92115633|ref|NC_007964.1| Nitrobacter hamburgensis X14 chromosome, complete genome gi|93004786|ref|NC_007968.1| Psychrobacter cryohalolentis K5 plasmid 1, complete sequence gi|93004831|ref|NC_007969.1| Psychrobacter cryohalolentis K5 chromosome, complete genome gi|291464753|ref|NC_007971.2| Cupriavidus metallidurans CH34 plasmid pMOL30, complete sequence gi|291434858|ref|NC_007972.2| Cupriavidus metallidurans CH34 plasmid pMOL28, complete sequence gi|94308945|ref|NC_007973.1| Cupriavidus metallidurans CH34 chromosome, complete genome gi|291481467|ref|NC_007974.2| Cupriavidus metallidurans CH34 megaplasmid, complete sequence gi|94676460|ref|NC_007984.1| Baumannia cicadellinicola str. Hc (Homalodisca coagulata), complete genome gi|94967031|ref|NC_008009.1| Candidatus Koribacter versatilis Ellin345 chromosome, complete genome gi|113706807|ref|NC_008010.2| Deinococcus geothermalis DSM 11300 plasmid pDGEO01, complete sequence gi|94986445|ref|NC_008011.1| Lawsonia intracellularis PHE/MN1-00 chromosome, complete genome gi|94972343|ref|NC_008012.1| Lawsonia intracellularis PHE/MN1-00 plasmid 1, complete sequence gi|94972373|ref|NC_008013.1| Lawsonia intracellularis PHE/MN1-00 plasmid 2, complete sequence gi|94972398|ref|NC_008014.1| Lawsonia intracellularis PHE/MN1-00 plasmid 3, complete sequence gi|94987631|ref|NC_008021.1| Streptococcus pyogenes MGAS9429 chromosome, complete genome gi|94989509|ref|NC_008022.1| Streptococcus pyogenes MGAS10270 chromosome, complete genome gi|94991497|ref|NC_008023.1| Streptococcus pyogenes MGAS2096 chromosome, complete genome gi|94993396|ref|NC_008024.1| Streptococcus pyogenes MGAS10750 chromosome, complete genome gi|94984109|ref|NC_008025.1| Deinococcus geothermalis DSM 11300, complete genome gi|104779316|ref|NC_008027.1| Pseudomonas entomophila L48 chromosome, complete genome gi|98152871|ref|NC_008036.1| Sphingopyxis alaskensis RB2256 F plasmid, complete sequence gi|99077902|ref|NC_008042.1| Ruegeria sp. TM1040 plasmid unnamed, complete sequence gi|99078009|ref|NC_008043.1| Ruegeria sp. TM1040 mega plasmid, complete sequence gi|99079841|ref|NC_008044.1| Ruegeria sp. TM1040 chromosome, complete genome gi|103485498|ref|NC_008048.1| Sphingopyxis alaskensis RB2256 chromosome, complete genome gi|104773257|ref|NC_008054.1| Lactobacillus delbrueckii subsp. bulgaricus ATCC 11842 chromosome, complete genome gi|107021562|ref|NC_008060.1| Burkholderia cenocepacia AU 1054 chromosome 1, complete sequence gi|107025343|ref|NC_008061.1| Burkholderia cenocepacia AU 1054 chromosome 2, complete sequence gi|107028231|ref|NC_008062.1| Burkholderia cenocepacia AU 1054 chromosome 3, complete sequence gi|108562424|ref|NC_008086.1| Helicobacter pylori HPAG1 chromosome, complete genome gi|108564598|ref|NC_008087.1| Helicobacter pylori HPAG1 plasmid pHPAG1, complete sequence gi|108756767|ref|NC_008095.1| Myxococcus xanthus DK 1622 chromosome, complete genome gi|108793732|ref|NC_008118.1| Yersinia pestis Nepal516 plasmid pMT, complete sequence gi|108793837|ref|NC_008119.1| Yersinia pestis Nepal516 plasmid pPCP, complete sequence gi|108793532|ref|NC_008120.1| Yersinia pestis Antiqua plasmid pMT, complete sequence gi|108793632|ref|NC_008121.1| Yersinia pestis Antiqua plasmid pPCP, complete sequence gi|108793642|ref|NC_008122.1| Yersinia pestis Antiqua plasmid pCD, complete sequence gi|108796981|ref|NC_008146.1| Mycobacterium sp. MCS chromosome, complete genome gi|108802373|ref|NC_008147.1| Mycobacterium sp. MCS plasmid1, complete sequence gi|108802856|ref|NC_008148.1| Rubrobacter xylanophilus DSM 9941 chromosome, complete genome gi|108810166|ref|NC_008149.1| Yersinia pestis Nepal516 chromosome, complete genome gi|108805998|ref|NC_008150.1| Yersinia pestis Antiqua chromosome, complete genome gi|110677421|ref|NC_008209.1| Roseobacter denitrificans OCh 114 chromosome, complete genome gi|110666976|ref|NC_008212.1| Haloquadratum walsbyi DSM 16790, complete genome gi|109644367|ref|NC_008213.1| Haloquadratum walsbyi DSM 16790 plasmid PL47, complete sequence gi|110666922|ref|NC_008226.1| Clostridium difficile 630 plasmid pCD630, complete sequence gi|109896332|ref|NC_008228.1| Pseudoalteromonas atlantica T6c chromosome, complete genome gi|109946640|ref|NC_008229.1| Helicobacter acinonychis str. Sheeba chromosome, complete genome gi|109948253|ref|NC_008230.1| Helicobacter acinonychis str. Sheeba plasmid pHac1, complete sequence gi|110346917|ref|NC_008242.1| Mesorhizobium sp. BNC1 plasmid 1, complete sequence gi|110347235|ref|NC_008243.1| Mesorhizobium sp. BNC1 plasmid 2, complete sequence gi|110347349|ref|NC_008244.1| Chelativorans sp. BNC1 plasmid 3, complete sequence gi|110669657|ref|NC_008245.1| Francisella tularensis subsp. tularensis FSC198 chromosome, complete genome gi|110640213|ref|NC_008253.1| Escherichia coli 536, complete genome gi|110632362|ref|NC_008254.1| Chelativorans sp. BNC1 chromosome, complete genome gi|110636427|ref|NC_008255.1| Cytophaga hutchinsonii ATCC 33406 chromosome, complete genome gi|110804074|ref|NC_008258.1| Shigella flexneri 5 str. 8401 chromosome, complete genome gi|110832861|ref|NC_008260.1| Alcanivorax borkumensis SK2 chromosome, complete genome gi|110798562|ref|NC_008261.1| Clostridium perfringens ATCC 13124 chromosome, complete genome gi|110801439|ref|NC_008262.1| Clostridium perfringens SM101 chromosome, complete genome gi|110803998|ref|NC_008263.1| Clostridium perfringens SM101 plasmid pSM101A, complete sequence gi|110804009|ref|NC_008264.1| Clostridium perfringens SM101 plasmid pSM101B, complete sequence gi|110804020|ref|NC_008265.1| Clostridium phage phiSM101 chromosome, complete genome gi|111017022|ref|NC_008268.1| Rhodococcus jostii RHA1 chromosome, complete genome gi|111024785|ref|NC_008269.1| Rhodococcus jostii RHA1 plasmid pRHL1, complete sequence gi|111026068|ref|NC_008270.1| Rhodococcus jostii RHA1 plasmid pRHL2, complete sequence gi|111026827|ref|NC_008271.1| Rhodococcus jostii RHA1 plasmid pRHL3, complete sequence gi|111074074|ref|NC_008273.1| Borrelia afzelii PKo plasmid cp30, complete sequence gi|111074118|ref|NC_008274.1| Borrelia afzelii PKo plasmid cp27, complete sequence gi|111114823|ref|NC_008277.1| Borrelia afzelii PKo, complete genome gi|111219505|ref|NC_008278.1| Frankia alni ACN14a chromosome, complete genome gi|113460149|ref|NC_008309.1| Haemophilus somnus 129PT chromosome, complete genome gi|113473942|ref|NC_008312.1| Trichodesmium erythraeum IMS101 chromosome, complete genome gi|113866031|ref|NC_008313.1| Ralstonia eutropha H16 chromosome 1, complete genome gi|116693960|ref|NC_008314.1| Ralstonia eutropha H16 chromosome 2, complete sequence gi|113952711|ref|NC_008319.1| Synechococcus sp. CC9311, complete genome gi|113951722|ref|NC_008320.1| Shewanella sp. MR-7 plasmid1, complete sequence gi|113968346|ref|NC_008321.1| Shewanella sp. MR-4 chromosome, complete genome gi|114045513|ref|NC_008322.1| Shewanella sp. MR-7 chromosome, complete genome gi|114319166|ref|NC_008340.1| Alkalilimnicola ehrlichii MLHE-1 chromosome, complete genome gi|114326555|ref|NC_008341.1| Nitrosomonas eutropha C91 plasmid1, complete sequence gi|114326611|ref|NC_008342.1| Nitrosomonas eutropha C91 plasmid2, complete sequence gi|114326664|ref|NC_008343.1| Granulibacter bethesdensis CGDNIH1 chromosome, complete genome gi|114330036|ref|NC_008344.1| Nitrosomonas eutropha C91 chromosome, complete genome gi|114561188|ref|NC_008345.1| Shewanella frigidimarina NCIMB 400 chromosome, complete genome gi|114565576|ref|NC_008346.1| Syntrophomonas wolfei subsp. wolfei str. Goettingen chromosome, complete genome gi|114568554|ref|NC_008347.1| Maricaulis maris MCS10 chromosome, complete genome gi|114797051|ref|NC_008358.1| Hyphomonas neptunium ATCC 15444 chromosome, complete genome gi|115313981|ref|NC_008369.1| Francisella tularensis subsp. holarctica OSU18 chromosome, complete genome gi|116248676|ref|NC_008378.1| Rhizobium leguminosarum bv. viciae 3841 plasmid pRL12, complete sequence gi|116249460|ref|NC_008379.1| Rhizobium leguminosarum bv. viciae 3841 plasmid pRL9, complete sequence gi|116249766|ref|NC_008380.1| Rhizobium leguminosarum bv. viciae 3841 chromosome, complete genome gi|116254467|ref|NC_008381.1| Rhizobium leguminosarum bv. viciae 3841 plasmid pRL10, complete sequence gi|116254910|ref|NC_008382.1| Rhizobium leguminosarum bv. viciae 3841 plasmid pRL7, complete sequence gi|116255067|ref|NC_008383.1| Rhizobium leguminosarum bv. viciae 3841 plasmid pRL8, complete sequence gi|116255200|ref|NC_008384.1| Rhizobium leguminosarum bv. viciae 3841 plasmid pRL11, complete sequence gi|115345482|ref|NC_008385.1| Burkholderia cepacia AMMD plasmid 1, complete sequence gi|115345530|ref|NC_008386.1| Roseobacter denitrificans plasmid pTB1, complete sequence gi|115345636|ref|NC_008387.1| Roseobacter denitrificans plasmid pTB2, complete sequence gi|115345694|ref|NC_008388.1| Roseobacter

bartaelterman commented 11 years ago

That's weird. Do you have any idea where the "Concatenated_sequences" in your output comes from? Because that is not present in your input fasta header lines. So how did you get that out of your blast database?

gtsiamis commented 11 years ago

Hi Brad,

I have a 9.3GB fasta file which I used to make the custom database.

To get the header for each fasta sequence I used awk. The command that I gave was:

$ awk '/>/' file_1 > file_2

And this is the file that I e-mailed to you.

The Concatenated_sequences is me query. I have 75989 such sequences that I am using in my blast search.

Hope this is clear.

All the best

George

On Sep 27, 2013, at 10:10 AM, bartaelterman notifications@github.com wrote:

That's weird. Do you have any idea where the "Concatenated_sequences" in your output comes from? Because that is not present in your input fasta header lines. So how did you get that out of your blast database?

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Hi George, could you send me one of those xml files again? Lets see if I can reproduce it.

gtsiamis commented 11 years ago

Hi Bart,

Thanks for your e-mail. Please find the xml file attached.

George

On Sep 30, 2013, at 9:35 PM, bartaelterman notifications@github.com wrote:

Hi George, could you send me one of those xml files again? Lets see if I can reproduce it.

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Hi George, email with attachment seems to be a problem. Could you put it on dropbox?

gtsiamis commented 11 years ago

Hi Brat,

The dropbox link is

https://www.dropbox.com/s/dhy6g6551csmfcd/test.xml

Best regards

George

On Oct 2, 2013, at 8:59 AM, bartaelterman notifications@github.com wrote:

Hi George, email with attachment seems to be a problem. Could you put it on dropbox?

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Thanks.

I noticed that the gi identifiers in the xml are in the <Hit_def> tags while they should be in the <Hit_id> tags. In your <Hit_id> tags, you have something like gnl|BL_ORD_ID|721. So there is your BL_ORD_ID. It's right on the place where the gi number should be.

I googled on this BL_ORD_ID and found this post. He explains that when running the makeblastdb command, that script searches for NCBI-style ids. If it doesn't find them, it creates BL_ORD_ID's by itself. Sounds like that could be your problem.

There seems to be an additional parameter -parse_seqids that you should add to your makeblastdb command when you create a custom blast database. It will tell the makeblastdb command to look for sequence ids, which are as far as I know present in your inputfile, and put them in the hit-id part of the blast database instead of in the hit-definition. (I don't know exactly where they are placed in the database, but you can tell from the xml output)

Can you try creating your blast database with the -parse_seqids parameter?

If that doesn't work, we can also modify my script and tell it to look for gi numbers in the <Hit_def> instead of in the <Hit_id> part of the xml.

gtsiamis commented 11 years ago

Hi Bart,

Thanks for your e-mail and the info provided.

You were right. using the -parse_seqids sortout the problem. Both script run without an error and I manage to get the taxonomy.

Looking at the output file I notice that in the taxonomy provided gives the Division (bacteria for instance) accompanied by the Genus and species. Is it possible to get the phylum and the sub-phylum taxonomy of each hit?

Thanks in advance

George

On Oct 2, 2013, at 9:42 AM, bartaelterman notifications@github.com wrote:

Thanks.

I noticed that the gi identifiers in the xml are in the tags while they should be in the tags. In your tags, you have something like gnl|BL_ORD_ID|721. So there is your BL_ORD_ID. It's right on the place where the gi number should be.

I googled on this BL_ORD_ID and found this post. He explains that when running the makeblastdb command, that script searches for NCBI-style ids. If it doesn't find them, it creates BL_ORD_ID's by itself. Sounds like that could be your problem.

There seems to be an additional parameter -parse_seqids that you should add to your makeblastdb command when you create a custom blast database. It will tell the makeblastdb command to look for sequence ids, which are as far as I know present in your inputfile, and put them in the hit-id part of the blast database instead of in the hit-definition. (I don't know exactly where they are placed in the database, but you can tell from the xml output)

Can you try creating your blast database with the -parse_seqids parameter?

If that doesn't work, we can also modify my script and tell it to look for gi numbers in the instead of in the part of the xml.

— Reply to this email directly or view it on GitHub.

bartaelterman commented 11 years ago

Yes you can. I would advice you to slightly modify the addTaxonomyToBlastOutput.py script yourself.

At the bottom of the script, you see two lines (line 34 and 35):

rank1 = taxonomy[0]["LineageEx"][0]["ScientificName"]
rank2 = taxonomy[0]["LineageEx"][1]["ScientificName"]

So variable rank1 is filled with the top-level taxonomy, and rank2 with the second level (I'm not sure whether it's always the Division, that depends on NCBI's taxonomy data). And these variables are printed to the output on line 36.

You can add other variables here, for instance:

rank3 = taxonomy[0]["LineageEx"][2]["ScientificName"]

and that should give you the next taxonomic level. Add as many variables as you want more specific taxonomic information. Just add those variables to the output line at line 36 and they will be printed to the output. Maybe it's also interesting to add more header fields on line 18 as well.

I cannot guarantee at which level the Phylum and Sub-phylum will be. I hope NCBI will be consequent in this...

gtsiamis commented 11 years ago

Thanks for your advice and guidance. It worked just fine. I was able to add the extra lines, went down to rank7.

I also included the rank3….rank7 in the print command as well and it works just fine. Superb!

Thanks again for your help and guidance. Much appreciated.

George

On Oct 2, 2013, at 12:05 PM, bartaelterman notifications@github.com wrote:

Yes you can. I would advice you to slightly modify the addTaxonomyToBlastOutput.py script yourself.

At the bottom of the script, you see two lines (line 34 and 35):

rank1 = taxonomy[0]["LineageEx"][0]["ScientificName"] rank2 = taxonomy[0]["LineageEx"][1]["ScientificName"] So variable rank1 is filled with the top-level taxonomy, and rank2 with the second level (I'm not sure whether it's always the Division, that depends on NCBI's taxonomy data). And these variables are printed to the output on line 36.

You can add other variables here, for instance:

rank3 = taxonomy[0]["LineageEx"][2]["ScientificName"] and that should give you the next taxonomic level. Add as many variables as you want more specific taxonomic information. Just add those variables to the output line at line 36 and they will be printed to the output. Maybe it's also interesting to add more header fields on line 18 as well.

I cannot guarantee at which level the Phylum and Sub-phylum will be. I hope NCBI will be consequent in this...

— Reply to this email directly or view it on GitHub.

gtsiamis commented 11 years ago

Hi Bart,

As I mentioned in my previous e-mail everything worked fine when using the test file.

I tried the scripts feeding a complete blast xml file. The first script (parsexmlblast.py) went through without a problem.

But when I tried the addTaxonomyToBlastOutput.py the first lines went OK and then I got the following error

Traceback (most recent call last): File "./addTaxonomyToBlastOutput.py", line 44, in main() File "./addTaxonomyToBlastOutput.py", line 40, in main rank7 = taxonomy[0]["LineageEx"][6]["ScientificName"] IndexError: list index out of range

My understanding is that for this particular gi number the rank7 taxonomy "cell" is empty.

I omitted this line, that means I was getting taxonomy up to rank6 but again I got the same error. I omitted rank5, rank4, rank3. Only when I have rank1 and rank2 the script was able to complete the run.

Is it possible to include in the script up to rank7 but if the string is empty then to print empty or no_taxonomy.

I googled and tried several changes but to no success.

Could you please advise on how this can be achived?

For your convenience I am attaching the dropbox link to addTaxonomyToBlastOutput.py script as I have change it and also I attaching the link for the outfile of the parsexmlblast.py.

https://www.dropbox.com/s/39yy37r1h4c80bg/addTaxonomyToBlastOutput.py

https://www.dropbox.com/s/w5gcf0cjhdf03ui/Supercontigs_table.txt

Looking forward for your reply.

George

On 2 Oct 2013, at 12:05, bartaelterman notifications@github.com wrote:

Yes you can. I would advice you to slightly modify the addTaxonomyToBlastOutput.py script yourself.

At the bottom of the script, you see two lines (line 34 and 35):

rank1 = taxonomy[0]["LineageEx"][0]["ScientificName"] rank2 = taxonomy[0]["LineageEx"][1]["ScientificName"] So variable rank1 is filled with the top-level taxonomy, and rank2 with the second level (I'm not sure whether it's always the Division, that depends on NCBI's taxonomy data). And these variables are printed to the output on line 36.

You can add other variables here, for instance:

rank3 = taxonomy[0]["LineageEx"][2]["ScientificName"] and that should give you the next taxonomic level. Add as many variables as you want more specific taxonomic information. Just add those variables to the output line at line 36 and they will be printed to the output. Maybe it's also interesting to add more header fields on line 18 as well.

I cannot guarantee at which level the Phylum and Sub-phylum will be. I hope NCBI will be consequent in this...

— Reply to this email directly or view it on GitHub.

bartaelterman commented 10 years ago

Hi George,

I am sorry for my late reply, supporting these repositories is not part of my daytime job, unfortunately.

I think your understanding is correct. It might be better to have the script check for the presence of the taxonomic ranks, before assigning it to a variable, which could cause an error. To overcome this problem, I updated the code in the central repository. Here is the new file: https://github.com/bartaelterman/BlastTaxonomy/blob/master/addTaxonomyToBlastOutput.py

You should only perform the same changes you did before: including rank3 - rank7, but include the try/except statements I added to the file for rank1 and rank2. Hope that helps.

gtsiamis commented 10 years ago

Hi Bart,

The change in the script worked fine!!! Thanks again for your input and help.

Kind regards

George

On Oct 10, 2013, at 9:20 AM, bartaelterman notifications@github.com wrote:

Hi George,

I am sorry for my late reply, supporting these repositories is not part of my daytime job, unfortunately.

I think your understanding is correct. It might be better to have the script check for the presence of the taxonomic ranks, before assigning it to a variable, which could cause an error. To overcome this problem, I updated the code in the central repository. Here is the new file: https://github.com/bartaelterman/BlastTaxonomy/blob/master/addTaxonomyToBlastOutput.py

You should only perform the same changes you did before: including rank3 - rank7, but include the try/except statements I added to the file for rank1 and rank2. Hope that helps.

— Reply to this email directly or view it on GitHub.

bartaelterman / BlastTaxonomy

Running parsexmlblast #1