a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.01k stars 126 forks source link

Protein Engineering #252

Open NAEV95 opened 1 year ago

NAEV95 commented 1 year ago

Hello!

I am doing some protein engineering work and would be nice to be able to be able to build graphs on protein mutations. For instance, it would be cool to be able to modify the spheres sizes of the graph nodes based on how many times that residue position has been changed in the dataset all in the graphein package and potentially having some additional features for those nodes (e.g. if you can compute the RMSD of the position given a change in residue, or based on the AAindex). One option could be to have the "peptides" library as well as an option for embedding of the AA for mutational effects.

a-r-j commented 1 year ago

Hi @NAEV95 thanks for the feature request!

1) For node sizing, this was a request in a slightly different context (see: #197). I'm pretty wrapped up with things at the moment so don't have bandwidth to implement this myself. I would, of course, be more than happy to support you if you want to make or add to this PR to suit your needs.

2) This is also super interesting. What are the additional features? I think BioPandas handles RMSD well; is there a reason this wouldn't be suitable?

3) peptides looks like a great library; I'll check it out.

NAEV95 commented 1 year ago

Heya,

I managed to do something here.

def update_node_sizes(fig_data, node_sizes, normalize_factor=100):
    """
    Update the node sizes in a Plotly plot of a graph.
    :param fig_data: fig.data attribute
    :param node_sizes: dictionary containing the new node sizes, with node names as keys and sizes as values
    """
    trace = fig_data.data[0]
    if trace["mode"] == "markers":
        node_sizes_values = [node_sizes[name] if name in node_sizes else size for name, size in zip(trace["text"], trace["marker"]["size"])]
        max_size = max(node_sizes_values)
        normalize_sizes = [size / max_size * normalize_factor for size in node_sizes_values]
        trace["marker"]["size"] = normalize_sizes

I basically saved the output of your function plotly_protein_structure_graph as a figure and then applied this function I created to normalize the node sizes according to my dictionary having as values how many times a residue has been changed in the structure. node_sizes_bigger

NAEV95 commented 1 year ago

Lastly, I would like to be able to share more about what features I might want to integrate but I am bounded by disclosure agreements here. Anyways, what I was thinking was for more time-series-like features where we may want to construct embeddings for the sequential structure of proteins from their graphs. Thank yo for the help anyways! I will keep trying to master graphs for now as I just started with them in the last couple of weeks.