choderalab / TargetExplorer

Database framework with RESTful API for aggregating genomic, structural, and functional data for target protein families.
GNU General Public License v2.0
6 stars 7 forks source link

Identification of protein domain boundaries #2

Open danielparton opened 10 years ago

danielparton commented 10 years ago

Currently we just take the UniProt boundaries, which are based on Prosite annotations (profile-based regular expression searches).

We should look into whether there is a more appropriate procedure.

Analysis of a quality multiple sequence alignment (likely using both sequence and structural data), with hierarchical clustering, would probably be a good start.

This problem may also be informed by the results of systematic expression tests of kinase construct variants, which are currently in progress.

jchodera commented 10 years ago

We may also be able to construct some sort of tool (be it a simple method that looks at alignments to known structures and penalizes overhangs, a machine learning classifier, or simulation-based scheme) to determine which construct boundaries are optimal for expression.