Pranavkhade / PACKMAN

PACKMAN: PACKing and Motion ANalysis
Other
33 stars 7 forks source link

Eccentricity - Infinite lengths #34

Closed Simard302 closed 2 years ago

Simard302 commented 2 years ago

When predicting hinges on the backbone, it sometimes fails due to infinite lengths. Is this caused by unconnected nodes? Can it be fixed by changing the alpha value? Should there be something in the code that handles this issue instead of crashing?

Pranavkhade commented 2 years ago

Hello,

Thank you for posting the issue. Can you please provide the specifics of the parameters you gave? Also, did you use GUI, API, or CLI?

Yes, infinite length error can be caused by alpha values that are too low, which makes the graphs unconnected. Ideally, we recommend in the paper and on interfaces to start from alpha value 2.8. However, despite starting from 2.8, sometimes there are missing atoms in the structure that make the graph with unconnected nodes.

Please give me the specifics, and I will try to pinpoint the problem and fix it.

Thanks, Pranav

Simard302 commented 2 years ago

I tried using an Alpha value of 2.8 but the issue remains. Its less of an issue that the hinge can't be calculated, more an issue that the program crashes. Could there be a check for infinite path length instead of crashing? Here is my PDB, and here is the code that causes the issue:

from packman import molecule, predict_hinge

mol = molecule.load_structure("1crn_broken.pdb")
chains = mol[0].get_chains()
for chain in chains:
        backbone = [j for i in chain.get_backbone() for j in i if j is not None]
        predict_hinge(backbone, open("1crn_broken_hinges.txt", "w"), Alpha=2.8)
        hingeResidues = [j for i in chain.get_hinges() for j in i.get_elements()]
        print(hingeResidues)

Error:

Traceback (most recent call last):
  File "hingetest.py", line 7, in <module>
    predict_hinge(backbone, open("1crn_broken_hinges.txt", "w"), Alpha=2.8)
  File "<...>\site-packages\packman\apps\predict_hinge.py", line 284, in predict_hinge
    centrality = eccentricity(ProteinGraph)
  File "<...>\site-packages\networkx\algorithms\distance_measures.py", line 264, in eccentricity
    raise nx.NetworkXError(msg)
networkx.exception.NetworkXError: Found infinite path length because the graph is not connected

1crn_broken.zip

Pranavkhade commented 2 years ago

Thank you for providing the description of the problem. I have to decide if I should warn people and provide the solution I am providing you in the tutorial or fix the problem at the source but it may create other problems. I have joined a job so it might take a little bit longer to decide.

Meanwhile, please use the following code for your work. Sorry for the inconvenience.

from packman import molecule, predict_hinge
import numpy

mol = molecule.load_structure("1crn_broken.pdb")
chains = mol[0].get_chains()
for chain in chains:
        backbone = [j for i in chain.get_backbone() for j in i if j is not None]
        for alpha in numpy.arange(2.8,10,0.5): #You can select any range for that matter
            try:
                predict_hinge(backbone, open("1crn_broken_hinges.txt", "w"), Alpha=alpha)
                hingeResidues = [j for i in chain.get_hinges() for j in i.get_elements()]
                print(hingeResidues)
                break
            except:
                None
Simard302 commented 2 years ago

I would like to keep the same alpha value constant for all of my runs. Would it be possible to add a layer before the eccentricity is calculated that checks for any disconnected nodes in the network and omits them?

Something along the lines of this:

from networkx import connected_components

#Alpha shape bit
alpha_shape, ProteinGraph = AlphaShape( atoms, Alpha, get_graph = True )

# Creating subgraph of largest connected component
largest_component = max(connected_components(ProteinGraph), key=len)
SubProteinGraph = ProteinGraph.subgraph(largest_component)

centrality = eccentricity(SubProteinGraph)
Pranavkhade commented 2 years ago

I do not think that is a great idea because it is possible that there is a residue missing in the hinge region that disconnects the graph, and now since we see them as two disconnected graphs, we are trying to find a hinge inside the areas that would be considered as domains before the division.

I think increasing the alpha value is the best solution. I am not sure why you want to keep the alpha value exactly the same (if you can share it would be nice), but I think gradually increasing it shouldn't be a problem since the packing of protein is relative. Even the same protein, such as calmodulin in two different forms (1EXR and 1PRW), requires two different initial alpha values.

Pranavkhade commented 2 years ago

Please let me know if you find the solution to increasing alpha value useful. I will close this issue on 06/15/2022 if the response is not received, assuming it is solved.

Thank you for reporting the issue. Pranav

Simard302 commented 2 years ago

This solution fixes the infinite length issue. If I have a pdb with multiple models, can using a different alpha value for different models predict entirely different hinges or are the results very similar?

Pranavkhade commented 2 years ago

Keep the alpha increment very low, and you will be fine.

Hinges can be different based on multiple factors, not just alpha value. However, we have seen that there is always overlap with the most prominent hinges. Lower alpha values show the most obvious hinges, and if you keep increasing the alpha value, you see less obvious hinges, as we have mentioned in the paper.

I will close this issue with this comment since the issue of the infinite length is solved and confirmed.