PATRIC3 / patric3_website

Legacy PATRIC Website (JBoss Portal Version)
MIT License
5 stars 2 forks source link

Data: aa sequences are missing for some CDSs #1289

Closed mshukla1 closed 7 years ago

mshukla1 commented 7 years ago

AA sequences are missing for ~130,000 CDSs annotated by PATRIC. They belong to ~11,000 genomes in to total.

Actions:

mshukla1 commented 7 years ago

After further analysis, it seems most of the CDSs missing AA sequences fall in one of the following categories:

All of the above situations result in erroneous translation resulting in uncommon start and/or premature stop codons. Hence, I decided not to populate those aa sequences in the database.

I have updated script that generate download files to exclude such features, to avoid fasta entries without sequences.

-Maulik

mshukla1 commented 7 years ago

@olsonanl : we need to modify the annotation service to discard partial pegs/CDSs that can not be correctly translated.

hyoo commented 7 years ago

@mshukla1 @olsonanl Are we still working on this or this can be closed?