How to properly ignore hydrogens?

michaelpun commented 2 years ago

I am running freesasa on pdb files that contain hydrogen atoms and I constantly run into the errors of the form AssertionError: Error: Radius array is <= 0 for the residue: SER ,atom: H1.

From reading the docs, I am under the impression that the program can ignore hydrogens and effectively calculate the SASA as if the hydrogens were not in the structure at all. Is this correct? If so how can I get this to work in the python interface?

Here is an example of code that outlines my current approach.

parser = pdb.PDBParser()
struct = parser.get_structure('2MFQ',
                              '2MFQ.pdb')
classifier = freesasa.Classifier()
fs_struct = freesasa.structureFromBioPDB(struct,
                                         classifier,
                                         options={'hetatm': False,
                                                  'hydrogen': False,
                                                  'join-models': False,
                                                  'skip-unknown': True,
                                                  'halt-at-unknown': False}
                                        )
fs_result = freeesasa.calc(fs_struct)[0]

Specifically the errors is returned from the call to structureFromBioPDB. I have also tried using freesasa.calcFromBioPDB() but receive the same errors.

This result should be reproducible with the pdb file associated with the above mentioned protein 2MFQ.

mittinatten commented 2 years ago

Hi, thanks for reporting this. As the documentation mentions, the BioPDB integration is not very well tested and you have uncovered a bug. I have pushed the fix (559f207) to the master branch.

mittinatten commented 2 years ago

The fix is available in the latest pre-release: https://pypi.org/project/freesasa/2.2.0a2/#description 🎉 We have some other changes that are blocking a proper release for now.

mittinatten commented 2 years ago

There was a problem with the build setup. It was easiest to create another release https://pypi.org/project/freesasa/2.2.0a3/#description. Hopefully this should solve your problem.

ehfd commented 2 years ago

Does structureFromBioPDB now show consistent results compared to directly importing the structure using freesasa now? This includes handling of disorderedresidues and disorderedatoms. @mittinatten

mittinatten commented 2 years ago

Hi, sorry for the slow response @ehfd. The fix above solves one bug, I am not aware of any others, but I still would say this part is not thoroughly tested. It should work fine in most cases though. The main cause of inconsistency is that freesasa might exclude some atoms, depending on the input and options, and then one has to be careful when mapping back to BioPDB (if you are analyzing the results atom by atom).

For a given structure you can check if you get the same results using both methods and a given set of options/parameters. If so, other variants of the same structure should also behave well. (By variants I mean only the coordinates change, not the sequence or presence of hydrogens, etc)

ehfd commented 2 years ago

Yeah, thank you @mittinatten for your answer. I will screen multiple structures by using freesasa command-line, structure loading using freesasa-python, and structure loading using Bio.PDB (both with the master branch instead of 2.1.0), and open a new issue if there are any disparities.

freesasa / freesasa-python

How to properly ignore hydrogens? #24