Closed tseemann closed 5 years ago
I've fixed all the sequences that needed to be reverse-complemented. Internal stop codons are harder; these are usually due to partial sequences for which an alternative accession is not available. I'll double-check them in the NCBI Pathogen Browser
CARD 3.0.6 released today, these all should be fixed. More rigorous QC code was added to look for sequences stored in incorrect orientation, but let us know if you find anything that slipped through.
We also updated our QC to use the correction translation table (https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG4) for Mycoplasma/Spiroplasma/Ureaplasma.
If you keep adding all the QC code i'll have to find somewhere else to file issues at! :-P
A small subset of sequences in CAR (and other AMR databases) have not been reverse-complemented in their FASTA representation. This is inconsistent with the other entries.
Some entries do revcom to become a valid ORF, whereas others don't. They must be amplicons that just happen to be a multiple of 3 nt long.