Open waalkes opened 2 years ago
You can download the FASTA file of all plasmids used to build the latest database. Here: https:/bioinfo.ut.ee/plasmidseeker/plasmidseeker_db_w20_fna_Nov-2021.tar.gz This may help to track down the actual identity of hits.
Thanks, that helps a lot. It appears though that the numbering system between the .list files and the .fna is not consistent.
PlasmidSeeker output:
# PLASMID CLUSTER 1 149428 166885 89.54% 1.12 0 Shigella flexneri 2002017 plasmid pSFxv_1, complete sequence 105 /mnt/disk4/labs/salipante/programs/PlasmidSeeker/db_w20/plasmid_3951.fna_20.list
yet when I look at plasmid_3951.fna it is not the same:
tron:/mnt/disk4/labs/salipante/programs/PlasmidSeeker/db_w20_fna $ head -n 1 plasmid_3951.fna
NZ_CP028268.1 Pediococcus pentosaceus strain SRCM102739 plasmid unnamed2, complete sequence
Is this by design? Both lists seem to contain the same number of plasmids so I assume I have the same db versions.
tron:/mnt/disk4/labs/salipante/programs/PlasmidSeeker $ ls -l db_w20/.list | wc -l 19782 tron:/mnt/disk4/labs/salipante/programs/PlasmidSeeker $ ls -l db_w20_fna/.fna | wc -l 19782 tron:/mnt/disk4/labs/salipante/programs/PlasmidSeeker $
I love your tool. Thanks for making it.
I am using it with a collection of WGS Shigella isolates and some of the plasmid descriptions have multiple hits on NCBI. Is there a way I can figure out which of the plasmids it is? Are all of them in your collection? The plasmids in your database are binary files.
Here are the two examples: Shigella flexneri 1a strain 0228 plasmid, complete sequence (There are four of that name CP012736.1, CP012734.1, CP012733.1 and CP012732.1) Escherichia coli O104:H4 str. C227-11 plasmid, complete sequence (There are six CP011332.1 to CP011337.1)
Also, this one doesn't appear to be in NCBI at all: Xuhuaishuia manganoxidans strain DY6-4 plasmid sequence
Thanks,
Adam Waalkes Research Scientist UWMC