Closed mshukla1 closed 7 years ago
After further analysis, it seems most of the CDSs missing AA sequences fall in one of the following categories:
All of the above situations result in erroneous translation resulting in uncommon start and/or premature stop codons. Hence, I decided not to populate those aa sequences in the database.
I have updated script that generate download files to exclude such features, to avoid fasta entries without sequences.
-Maulik
@olsonanl : we need to modify the annotation service to discard partial pegs/CDSs that can not be correctly translated.
@mshukla1 @olsonanl Are we still working on this or this can be closed?
AA sequences are missing for ~130,000 CDSs annotated by PATRIC. They belong to ~11,000 genomes in to total.
Actions: