appliedbinf / el_gato

MIT License
9 stars 1 forks source link

Possible database update needed #20

Closed vascokarla closed 1 month ago

vascokarla commented 1 month ago

After the recent update to the database, which included the neuA_215 reference, we observed improvements in the identification of the neuA_neuAH locus in some of our Legionella pneumophila samples. This allowed for the subsequent Sequence-Based Typing (SBT) of these samples, which was not possible before the update. However, we have identified two cases where the neuA_neuAH locus is still missing, and we suspect that this may be due to the need for a closer reference in the database.

Additionally, we encountered a case where the mip gene was not identified by el_gato. We suspect this issue could also be related to the need for a closer reference in the database.

We are uploading a CSV file that contains details about the samples, including their public NCBI identifiers, results of SBT analysis with both the old and new database, and sequencing quality checks obtained from PHoeNIx.

Thank you so much for your updates and support!

sbt_check_allele_reference.csv

jennahamlin commented 1 month ago

Hi @vascokarla

Thank you for the information you provided. Using el_gato 1.19, I generated a call for two (SRR10080716 and SRR10177472) of the three isolates you specified as 'MD-,' also, please be aware that SRA run numbers are generally SRR/ERR/DRR. The SRS numbers you listed as SRA are not accession numbers that can be downloadable using the sra toolkit.

Sample (SRS) Run (SRR)
SRS5357598 SRR10080716
SRS5431100 SRR10177472
SRS5832736 SRR10698366

After viewing the reads in IGV for the isolate SRR10698366 with an 'MD-' call, you have reads that map across the entire neuA region for SRR10698366. However, in the middle of the area, you have fewer reads. We will get back to you as there are a few more things to investigate regarding whether this should be a new neuA reference.

jennahamlin commented 1 month ago

Hi @vascokarla

I do not think SRR10698366 is a new neuA reference. There are a few reasons for this:

1) Our old in silico SBT tool generated a full ST call of ST1 2) The assemblied genome from NCBI (GCA_015963385.1_PDT000646143.1_genomic.fna) also produced a full ST call of ST1 3) Changing the default depth in el_gato (-d 5) generated a full ST call of ST1 4) When the depth is at default (-d 10), the run.log indicates that one position in neuA can't be resolved with a depth of 10 5) Pairwise alignment using the reference neuA allele and the one generated with a depth of 5 showed no differences

vascokarla commented 1 month ago

Hi @jennahamlin,

Thank you so much for thoroughly checking the details and providing such a detailed response. I understand now that an update to the neuA reference is not needed, based on your investigation and the results you shared. I really appreciate the time and effort you’ve put into this!

Also, I apologize for the confusion with the SRS numbers instead of the SRR run numbers in my previous message. To clarify, in the SRA, SRR refers to the run, while SRS refers to the sample. I’ll be more mindful of that distinction moving forward.

Thanks again for your support