degregory / SAM_Refiner

Program for gathering variant information from a SAM formated file.
GNU General Public License v3.0
7 stars 2 forks source link

Incorrect amino acid translation #2

Closed EvolDoc closed 3 years ago

EvolDoc commented 3 years ago

Greetings, I think the amino acid numbering is off. In my CollectedChimerasRemoved.tsv output, I see the correct nucleotide positioning, but the amino acids are incorrect, at least if they are to correspond with a particular protein. For example, in this list, nt 23,403G should correspond to D614G (or in this case, G614G). Instead it corresponds to 23403/3 = 7801G.

Unique Sequence

10029T(N3343N) 12100T(L4034F) 12789T(N4263N) 15279T(H5093H) 1567T(P523S) 17344G(I5782V) 203T(S68L) 222T(I74I) 241T(R81C) 21306T(R7102R) 222T(I74I) 241T(R81C) 23403G(G7801G) 23604A(S7868S) 23604A(S7868S) 23635T(H7879Y) 23638T(H7880Y) 241T(R81C) 25553T(A8518V) 25613T(S8538F) 27539C(L9180P) 275T(P92L) 27972T(L9324L) 28095T(L9365L) 28111G(T9371A) 28144C(Y9382H) 28472T(P9491L) 28512G(S9504S) 28603T(L9535L) 28657T(R9553W) 28877TC(Q9626L) 28881A(9627) 28882ACr(G9628T) 28881A(9627) 28882ACr(G9628T) 28887T(N9629N) 29197T(P9733S) 29362T(P9788S) 29366T(T9789I) 29402T(9801L) 29421T(T9807T) 2973T(G991G) 3037T(L1013L) 29867A(M9956K) 29870A(T9957K) 29868A(M9956I) 29870A(T9957K) 29870A(T9957K) 3037T(L1013L) 5462G(E1821G) 5488G(R1830G) 5488G(R1830G) 745T(L249L) 8782T(P2928S) 913T(R305) Reference

degregory commented 3 years ago

To get it to process the AAs correctly, you'll have to use an ORF as the reference for both generating the SAM files and for running SAM Refiner.
For spike you can use https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?from=21563&to=25384&report=fasta For orf1ab you can use https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?location=266:13468,13468:21555&report=fasta (or the equivalent for the reference sequence you're interested in) If you are running SAM Refiner against an entire genome, it will probably be simpler to turn off the AA reporting (--AAreport 0)

EvolDoc commented 3 years ago

There we go! That worked. Thank you!

Unique Sequences

1064T(R355M) 1250C(K417T) 1355G(L452R) 1433A(T478K) 1709A(A570D) 1709A(A570D) 1841G(D614G) 1841G(D614G) 1963T(H655Y) 1979T(Y660F) 2025A(Q675Q) 203-208Del(I68I) 2042A(P681H) 2042A(P681H) 2073T(S691S) 2042A(P681H) 2073T(S691S) 2080T(A694S) 2042A(P681H) 2073T(S691S) 2091T(M697I) 2042A(P681H) 2073T(S691S) 2100G(G700G) 2042A(P681H) 2147T(T716I) 2076T(I692I) 2147T(T716I) 2194G(T732A) 2463-2463Del 2933G(N978S) 2944G(S982A) 3080T(T1027I) 3135A(K1045K) 3310C(V1104L) 3352C(D1118H) 3656T(G1219V) 3682T(V1228L) 419-427Del(F140F) 429-431Del(Y144V) 456T(W152C) 644-insertCGGCAGGCT(215AAGY) 665T(A222V) Reference