arpcard / amr_curation

A public repository for collective curation of antimicrobial resistance (AMR) genes and mutations. Submit, discuss, and resolve AMR curation issues.
58 stars 6 forks source link

How to handle reversed sequence entries #20

Closed tseemann closed 5 years ago

tseemann commented 5 years ago

A small subset of sequences in CAR (and other AMR databases) have not been reverse-complemented in their FASTA representation. This is inconsistent with the other entries.

Some entries do revcom to become a valid ORF, whereas others don't. They must be amplicons that just happen to be a multiple of 3 nt long.

WARNING: OXA-151 - has internal stop codons, trying revcom
OXA-151 - revcom resolves problem, hooray!

WARNING: tet(C) - has internal stop codons, trying revcom
tet(C) - revcom resolves problem, hooray!

WARNING: SHV-112 - has internal stop codons, trying revcom
WARNING: SHV-112 - revcom has internal stop codons too

WARNING: SPG-1 - has internal stop codons, trying revcom
SPG-1 - revcom resolves problem, hooray!

WARNING: OXA-368 - has internal stop codons, trying revcom
WARNING: OXA-368 - revcom has internal stop codons too

WARNING: OXA-14 - has internal stop codons, trying revcom
WARNING: OXA-14 - revcom has internal stop codons too

WARNING: ErmB - has internal stop codons, trying revcom
ErmB - revcom resolves problem, hooray!

WARNING: OXA-153 - has internal stop codons, trying revcom
OXA-153 - revcom resolves problem, hooray!

WARNING: OXA-135 - has internal stop codons, trying revcom
OXA-135 - revcom resolves problem, hooray!

WARNING: CTX-M-108 - has internal stop codons, trying revcom
WARNING: CTX-M-108 - revcom has internal stop codons too

WARNING: LEN-3 - has internal stop codons, trying revcom
WARNING: LEN-3 - revcom has internal stop codons too

WARNING: OXA-17 - has internal stop codons, trying revcom
WARNING: OXA-17 - revcom has internal stop codons too

WARNING: Tet(X4) - has internal stop codons, trying revcom
Tet(X4) - revcom resolves problem, hooray!

WARNING: dfrA18 - has internal stop codons, trying revcom
dfrA18 - revcom resolves problem, hooray!

WARNING: KPC-4 - has internal stop codons, trying revcom
WARNING: KPC-4 - revcom has internal stop codons too

WARNING: EreA - has internal stop codons, trying revcom
EreA - revcom resolves problem, hooray!

WARNING: CTX-M-109 - has internal stop codons, trying revcom
WARNING: CTX-M-109 - revcom has internal stop codons too

WARNING: QnrS3 - has internal stop codons, trying revcom
WARNING: QnrS3 - revcom has internal stop codons too

WARNING: CTX-M-107 - has internal stop codons, trying revcom
WARNING: CTX-M-107 - revcom has internal stop codons too

WARNING: KPC-9 - has internal stop codons, trying revcom
WARNING: KPC-9 - revcom has internal stop codons too

WARNING: OXA-16 - has internal stop codons, trying revcom
WARNING: OXA-16 - revcom has internal stop codons too

WARNING: PEDO-3 - has internal stop codons, trying revcom
PEDO-3 - revcom resolves problem, hooray!

WARNING: TEM-199 - has internal stop codons, trying revcom
WARNING: TEM-199 - revcom has internal stop codons too

WARNING: SHV-53 - has internal stop codons, trying revcom
WARNING: SHV-53 - revcom has internal stop codons too

WARNING: LEN-4 - has internal stop codons, trying revcom
WARNING: LEN-4 - revcom has internal stop codons too
arpcard commented 5 years ago

I've fixed all the sequences that needed to be reverse-complemented. Internal stop codons are harder; these are usually due to partial sequences for which an alternative accession is not available. I'll double-check them in the NCBI Pathogen Browser

arpcard commented 5 years ago

CARD 3.0.6 released today, these all should be fixed. More rigorous QC code was added to look for sequences stored in incorrect orientation, but let us know if you find anything that slipped through.

We also updated our QC to use the correction translation table (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG4) for Mycoplasma/Spiroplasma/Ureaplasma.

tseemann commented 5 years ago

If you keep adding all the QC code i'll have to find somewhere else to file issues at! :-P