appliedbinf / el_gato

MIT License
9 stars 1 forks source link

incorrect designation #14

Closed rediattewolde closed 3 months ago

rediattewolde commented 4 months ago

Dear El_gato developer

Thank you for developing el_gato.

• Of 752, the following 6 samples gave incorrect ST (mis-designation). Could i please email you the fastq files so you can investigate? <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

  | sanger  ST | WGS el gato ST -- | -- | -- 1 | 75(2,3,18,13,25,5,6) | 20 (2,3,18,15,2,1,6) 2 | 345 (6,10,19,3,19,4,11) | novelST (6,10,19,3,9,4,11) 3 | 1804(26,41,22,21,55,48,31) | novelST(26,41,22,21,NAT,48,31) 4 | 1804(26,41,22,21,55,48,31) | novelST(26,41,22,21,NAT,48,31) 5 | 23 (2,3,9,10,2,1,6 ) | 2439 (2,3,9,10,93,1,6 ) 5 | 1376(1,4,3,16,2,1,208) | Novel ST(1,4,3,16,9,1,208)

Alan-Collins commented 4 months ago

Hi,

We've made some improvements to El_gato's performance with low coverage sequencing and a few other situations on the dev branch of this repo. Would you mind rerunning these problem samples with the dev version and letting me know if you get correct designations? If the dev branch does not improve things then I would be happy to take a look and see if this is something we can fix.

Thanks, Alan

rediattewolde commented 3 months ago

Hi Alan

I tried the dev branch with the samples and the mis-designation is now fixed but i am still getting MA? for some of my ST23, ST18, ST62, ST1804 samples using both master and dev branch

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

sanger  ST WGS el gato ST MASTER BRANCH WGS el gato ST DEVELOPMENT  BRANCH
345 (6,10,19,3,19,4,11) novelST (6,10,19,3,9,4,11) MA?(6,10,19,3,?,4,11
1804(26,41,22,21,55,48,31) novelST(26,41,22,21,NAT,48,31) MA?(26,41,22,21,?,48,31
1804(26,41,22,21,55,48,31) novelST(26,41,22,21,NAT,48,31) MA?(26,41,22,21,?,48,31
23 (2,3,9,10,2,1,6 ) 2439 (2,3,9,10,93,1,6 ) MA?(2,3,9,10,?,1,6
1376(1,4,3,16,2,1,208) Novel ST(1,4,3,16,9,1,208) MA?(1,4,3,16,?,1,208

Thanks Rediat

Alan-Collins commented 3 months ago

Looks like it was mompS that wasn't called in all those samples. What does the log say? If your reads are on the shorter side (e.g. 150bp) then it is sometimes impossible to tell which mompS allele is in the locus used for SBT. This issue is because shorter read pairs (and shorter fragment sizes) more often don't cover the biallelic site that distinguishes the two alleles AND the SBT primer that can distinguish the two loci. If that is the case here then the end of your log file will contain something like the below information.

[04/05/2024 08:30:41 AM | out/01416787 ]  Identified allele information:

2 mompS allele identified.
[04/05/2024 08:30:41 AM | out/01416787 ]  mompS allele '2' information
[04/05/2024 08:30:41 AM | out/01416787 ]  lowest coverage of bialleleic site: 169
[04/05/2024 08:30:41 AM | out/01416787 ]  number of reads from this allele containing outermost reverse primer sequence: 0
[04/05/2024 08:30:41 AM | out/01416787 ]  number of reads from this allele containing outermost reverse primer sequence in the reverse orientation (indicating this is the secondary allele): 0
[04/05/2024 08:30:41 AM | out/01416787 ]  mompS allele '63' information
[04/05/2024 08:30:41 AM | out/01416787 ]  lowest coverage of bialleleic site: 173
[04/05/2024 08:30:41 AM | out/01416787 ]  number of reads from this allele containing outermost reverse primer sequence: 0
[04/05/2024 08:30:41 AM | out/01416787 ]  number of reads from this allele containing outermost reverse primer sequence in the reverse orientation (indicating this is the secondary allele): 0
[04/05/2024 08:30:41 AM | out/01416787 ]  Unable to determine which allele is present in native mompS locus
[04/05/2024 08:30:41 AM | out/01416787 ]  Failed to determine primary mompS allele. Primary mompS allele is identified by finding read pairs that cover both biallelic positions and sequencing primer. In this sample, no such reads were found. Perhaps sequencing reads are too short.
rediattewolde commented 3 months ago

Hi Alan

Yes, the reads are on the shorter side (e.g. 150bp) and I am getting the error " Failed to determine primary mompS allele. Primary mompS allele is identified by finding read pairs that cover both biallelic positions and a sequencing primer. In this sample, no such reads were found. Perhaps sequencing reads are too short."

Ok, I will try with a longer read size (greater than 150bp) and let you know.

Thanks Rediat