Alignment problems - Githubissues

j-wags commented 8 years ago

In testing the updated docking stages, I've found a few weird alignment cases in the tarball.

For example: Week 19's apo-5b15_3vpe Week 19's apo-5e8z_4x2n (notably large distance, alignment may have skipped) Week 19's apo-5job_3t13 Week 22's apo-5b2z-1bkd Week 22's smallest-5ev8_2doo is aligned to the wrong domain of largest (docking box will not be drawn there, this is tough, maybe should be separate issue) Week 22's apo-5ewa_4f6h (again, wrong domain aligned, tough problem)

j-wags commented 8 years ago

Test out

Use pymol to truncate largest protein structure to binding site, then align to that
Use the pdb fatcat server/alignment tool
Other possible solutions

j-wags commented 8 years ago

Issue #56 should cover this much better in the interim while we await testing for a better program.

j-wags commented 8 years ago

I'm pursuing the pymol route here. Key questions to answer will be: Which atoms are by default used for the alignment? Can this be changed? Could a pymol-based command line tool for the alignment offer more options than Schrodinger structalign?

j-wags commented 8 years ago

Preliminary code, which performs the alignment using geometric (proximity to binding site) and secondary structure (helices and sheets) constraints. The atom selection is currently set to alpha carbons but can be modified. I do not see any option to apply different weights to different atoms. This will not run on nif1 as pymol is not yet available.

import __main__
__main__.pymol_argv = [ 'pymol', '-c'] #put -cq here to suppress pymol output
import pymol
pymol.finish_launching()
import sys

#### Helper function from http://pymolwiki.org/index.php/Ss
def ss(selection):

    class SSE(object):

        def __init__(self, start, typ):
            self.start, self.typ = start, typ
            self.end = -1

        def __repr__(self):
            return "%s-%s %s" % (self.start, self.end, self.typ)

    pymol.stored.pairs = []
    pymol.cmd.iterate("%s and n. ca" % selection, "stored.pairs.append((resi, ss))")
    num, currentType = pymol.stored.pairs[0]

    sses = [SSE(num, currentType)]
    currentSSE = sses[0]
    for resi, ssType in pymol.stored.pairs:
        if ssType == currentType:
            currentSSE.end = resi
        else:
            sses.append(SSE(resi, ssType))
            currentSSE = sses[-1]
            currentType = ssType

    return sses
    #for sse in sses:
    #    print sse

caAlignment = True

molPrefixes = ['largest_splitted_receptor1',
               'smallest_splitted_receptor1',
               'apo_splitted_receptor1',
               'holo_splitted_receptor1',]

for molPrefix in molPrefixes:
    pymol.cmd.load('/extra/banzai2/j5wagner/CELPP/2016_07_14_pymol_alignment/test_data/5ev8/%s.pdb' %(molPrefix))

### Geometric filter ###
# >>> help(pymol.cmd.pseudoatom)
#Help on function pseudoatom in module pymol.creating:
#
#pseudoatom(object='', selection='', name='PS1', resn='PSD', resi='1', chain='P', segi='PSDO', elem='PS', vdw=-1.0, hetatm=1, b=0.0, q=0.0, color='', label='', pos=None, state=0, mode='rms', quiet=1,)

center = open('/extra/banzai2/j5wagner/CELPP/2016_07_14_pymol_alignment/test_data/5ev8/center.txt').read().split()
center = [float(i.strip(',')) for i in center]
print center
pymol.cmd.pseudoatom('largest_lig_center', pos=center)
pymol.cmd.select('reference',"(/largest_splitted_receptor1//A//CA) and (byres (largest_lig_center around 15))" ) # Remove reference to chain A in actual deploy - Shuai's solution to issue 56 will cover this

for molPrefix in molPrefixes[1:]:
    print
    print
    print '=========================='
    print "PROCESSING", molPrefix

    ### Secondary structure filter ###  
    # Get resnums and their SS types - Can filter to "H" and "S" with list comp
    #resiSS = pymol.cmd.iterate('/apo_splitted_receptor1////CA','(resi,resv,ss)')
    sses = ss('/%s' %(molPrefix)) # not actually used, but cool
    print sses # not actually used, but cool
    pymol.stored.tuples = []
    resiSS = pymol.cmd.iterate(molPrefix,'stored.tuples.append((resn,resv,ss))')
    print pymol.stored.tuples

    print "FILTERING FOR SS"
    ss_near_bs = [i for i in pymol.stored.tuples if i[2] in ['H','S']]
    print ss_near_bs
    resv_sel_str = '+'.join([str(i[1]) for i in ss_near_bs])
    print resv_sel_str

    ### 3D alignment ###

    ##  API reference  ##
    ## http://www.pymolwiki.org/index.php/Align
    #cmd.align( string mobile, string target, float cutoff=2.0,
    #           int cycles=5, float gap=-10.0, float extend=-0.5,
    #           int max_gap=50, string object=None, string matrix='BLOSUM62',
    #           int mobile_state=0, int target_state=0, int quiet=1,
    #           int max_skip=0, int transform=1, int reset=0 )

    # Remove the A chain label when this reaches production - Shuai's fix to issue 56 will only leave one chain.
    if caAlignment:
        alnOut = pymol.cmd.align('/%s//A//CA'%(molPrefix),
                                 'reference'
                                 #'/largest_splitted_receptor1//A/`%s/CA'%(resv_sel_str)
                                 )# Remove reference to chain A in actual deploy - Shuai's solution to issue 56 will cover this
    else:
        alnOut = pymol.cmd.align('/%s//A//'%(molPrefix),
                                 'reference',
                                 #'/largest_splitted_receptor1//A//CA',
                                 )# Remove reference to chain A in actual deploy - Shuai's solution to issue 56 will cover this
    pymol.cmd.save('%s_aligned.pdb' %(molPrefix),
                   '/%s' %(molPrefix)
                   )
    print alnOut
pymol.cmd.save('largest_reference.pdb' ,
               'reference'
               )

#print sys.stdout.read()
pymol.cmd.quit()

j-wags commented 8 years ago

Meeting note: We want calpha, radius, and SS emphasis.

j-wags commented 8 years ago

Implemented in current jeffdev, and appears to run properly and get qualitatively similar results to schrodinger structalign. Only currently being used for the chimera_proteinligprep stage. We should discuss if we want it in the evaluation stage, where we do another alignment.

mkgilson commented 8 years ago

nice! can you show some sample results at next celpp meeting?

On 7/28/2016 8:45 PM, j-wags wrote:

Implemented in current jeffdev, and appears to run properly and get qualitatively similar results to schrodinger structalign. Only currently being used for the chimera_proteinligprep stage. We should discuss if we want it in the evaluation stage, where we do another alignment.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/drugdata/D3R/issues/39#issuecomment-236087811, or mute the thread https://github.com/notifications/unsubscribe-auth/AQEJQHfV7ORpeKBfl4n4lYMd6Y9sKAtcks5qaXdwgaJpZM4IsN8r.

j-wags commented 8 years ago

Sure. I'm out tomorrow but we can discuss wednesday

j-wags commented 8 years ago

Pymol alignment implemented in pull request #71

drugdata / D3R

Alignment problems #39