MAphd / PAMLST

R script to determine sequence type of PA genomes
GNU General Public License v3.0
0 stars 0 forks source link

Allocation error using updated database ST #1

Closed Dx-wmc closed 7 months ago

Dx-wmc commented 7 months ago

Hi, I used the -u option of the script to update the database, but when I used the updated database for MLST, I found that a large number of strains were assigned as 9999, which is obviously incorrect. So I ran it again using the un-updated database and got the correct result. However, since STs are continuously updated, is it possible to optimize the script for updating databases?

MAphd commented 7 months ago

Hi, I'm having trouble replicating your error. If possible can you send one or more of the genomes that are assigned as 9999 (or provide NCBI accession numbers)?

Dx-wmc commented 7 months ago

Yes, for example. I used the genome (No.Acc: GCF_000480435.1) as input by the updated database (Rscript PAMLST.R -u), it was assigned as 9999. However, the real ST is 2627.

MAphd commented 7 months ago

So it seems like pubMLST included a bunch of acsA alleles that are similar to a region in the acsB gene, which is causing issues with this script. With the updated database, the script finds two acsA alleles: acsA 16 and acsA 347 in different contigs of GCF_000480435.1 and deems it untypable because of it. So, updating the database does work, but I will work on a workaround for this specific issue in the near future. Thanks for letting me know.

Dx-wmc commented 7 months ago

Hope to see the updated version soon.