freesasa / freesasa-python

FreeSASA Python Module
MIT License
44 stars 12 forks source link

Calculation of SASA for specific heteroatoms in protein structure and assignment of radii #16

Closed ioChris closed 3 years ago

ioChris commented 3 years ago

Hello, I am trying to use FreeSASA (in python 3.8) to calculate the SASA of certain heteroatoms in protein structures, specifically of metal ions (Zn, Mg, etc), which are almost always assigned as heteroatoms in the pdb file. I am running into 2 issues. Initially I enable the inclusion of heteroatoms by the following command:

fileName = ‘1m74.ent’
structure = freesasa.Structure(fileName, options = {'hetatm' : True})

This seems to work, I am getting a list of warnings for all hetatms in the structure and the respective guessing that FreeSASA does to assign each heteroatom a vdw radius.

Issue 1: Selecting the atoms In order to get the SASA of i.e. all Magnesium atoms in the structure, I am trying to select them like so: selection = freesasa.selectArea(['Ion_area, symbol Mg'], structure, result) This does not seem to work, as I am getting back a zero result (in structure(s) where I know that the atom(s) are exposed). My question is, am I giving the wrong command or is it possible that these atoms are not selectable?

Issue 2: Assigning radii I have found that besides the vdw radii of elements occurring in amino acids, FreeSASA also includes a select few elements that are found as cofactors in protein structures. As I am interested in a wide range of ions, the included list of radii are still not covering them all and I need to specify the radius for each element I am looking for at each time (Zinc, Cobalt, etc.) I have tried 2 solutions to this: 1: modifying the local ‘classifier.c’ script of FreeSASA by adding more elements and their respective radii into the dictionary ‘static const struct symbol_radius symbol_radius[]’. The changes seem to be getting ignored when I re-run FreeSASA (for example zinc is still not recognized and its radius is assigned to 0). 2: using the DerivedClassifier in my python script to add the new elements that I expect to find in the structure, like so:

class DerivedClassifier(freesasa.Classifier):
    # this must be set explicitly in all derived classifiers
    purePython = True
    def classify(self, residueName, atomName):    
        if re.match('\s*Mg', atomName):
            return 'Magnesium'
        return 'Not-magnesium'    
    def radius(self, residueName, atomName):
        if re.match('\s*Mg',atomName): # Magnesium
            return 1.74         
        return 0;                     # everything else (Hydrogen, etc)
classifier = DerivedClassifier()

This is likely a bad implementation from my side as it is producing a very different Total measured SASA (almost double comparing to running with the default classifier). When using the derived classifier however, the selection seems to work as I am getting a non-zero SASA measure for the selection:

structure = freesasa.Structure(fullPath, classifier, options = {'hetatm' : True})
selections = freesasa.selectArea(['Ion_area, symbol Mg'], structure, result)
>Ion_area : 10.97 A2

The fact that I am getting a result is good, however I am not sure I can trust it as the measurement is consistently different comparing to the one from PyMOL (in this case, 12.2 A2). If you have a suggestion for any of these issues, they are very welcome.

Best, Chris

mittinatten commented 3 years ago

Hi, I ran the calculation on 1m74 and with hetatms, using the CLI, not the python module, and HETATM 6404 is assigned a radius of 1.74 A, but the SASA is 0 A2.

Running

> freesasa 1m74.pdb --format=json --hetatm --depth=atom

give

 // ... "residue" MG:
                {
                  "name":" MG",
                  "number":"902",
                  "area":{
                    "total":0.0,
                    "polar":0.0,
                    "apolar":0.0,
                    "main-chain":0.0,
                    "side-chain":0.0
                  },
                  "n-atoms":1,
                  "atoms":[
                    {
                      "name":"MG",
                      "area":0.0,
                      "is-polar":false,
                      "is-main-chain":false,
                      "radius":1.74
                    }
                  ]
                },
  // ...

I'm not sure how PyMOL does this (haven't used it in a whil). I noticed that 1m74 has a lot of water molecules, if you include HETATM, all HETATMs are included, including waters, so they might be burying your ions? Maybe PyMOL is more clever about this. There is currently no way of automatically filtering out waters, so if you want HETAMs without waters you will need to preprocess your file to remove them. (Although I can see that could be a useful future improvement to the library).

The problem with your derived classifier, is that it sets all other elements to radius 0, so that will give you a strange result, and generally too large (it doesn't mean the total area is 0, because the water probe radius is added). If you want to just change the value of one element in a derived classifier, you should add a fallback to the the original classifier for all elements other than the one you want to change. Otherwise you used the classifier the way it's intended. If you want to add custom elements you can do it this way, or add a classifier file (take one of the original ones and extend it with the atoms you need), and use that to initialize the classifier: http://freesasa.github.io/python/classes.html#freesasa.Classifier.__init__ .

Does that help?

mittinatten commented 3 years ago

I'll close this, feel free to reopen if you have more questions.