XieResearchGroup / Physics-aware-Multiplex-GNN

Code for our Nature Scientific Reports paper "A universal framework for accurate and efficient geometric deep learning of molecular systems"
https://www.nature.com/articles/s41598-023-46382-8
54 stars 11 forks source link

Not able to replicate the environment #3

Closed AnjaliSetiya closed 6 months ago

AnjaliSetiya commented 6 months ago

I have been trying to replicate the environment, I create a new conda env and pip install -r requirements.txt but I'm constantly getting the error "torch_scatter-2.0.4+cu101-cp37-cp37m-linux_x86_64.whl is not a supported wheel on this platform." I have tried on three different linux machines I'm getting same error. Please let me know what can be done.

zetayue commented 6 months ago

This might be due to the Python or CUDA versions. The wheels "XXXXXXX+cu101-cp37-cp37m-linux_x86_64.whl" listed in requirements.txt are especially for installing torch-geometric related libs (torch-scatter, torch-sparse and torch-cluster) with CUDA 10.1 and Python 3.7.x. If your environment has other Python or CUDA versions, you can install them with the corresponding wheels under https://data.pyg.org/whl/ (instruction: https://pytorch-geometric.readthedocs.io/en/latest/install/installation.html#installation-from-wheels).

AnjaliSetiya commented 6 months ago

Thanks for the reply, I have CUDA 10.1 and Python 3.7.x. in my conda env and followed the steps , seems like a new issue of GLIBCXX has arrived. now i'm using the docker image provided by you and it works like a charm. So thanks for docker image. I have a follow up question I want to extract the atom level embeddings of molecules using the pamnet network. How do you suggest that should be done. Please let me know, that will be really helpful. Thanks

zetayue commented 6 months ago

Glad to know that the docker image helps.

I have a follow up question I want to extract the atom level embeddings of molecules using the pamnet network. How do you suggest that should be done.

For example, if you want to have the learned atom-level embeddings from the global message passing, it is the out in the following line: https://github.com/XieResearchGroup/Physics-aware-Multiplex-GNN/blob/main/layers/global_message_passing.py#L46

In the original code, since the tasks all need a single value as prediction, so I reduced its dimension to 1 finally.

AnjaliSetiya commented 5 months ago

Thanks for the clarification.

AnjaliSetiya commented 5 months ago

Hi , based on your comments I have made a python file that helps me generate the said embeddings.

However for the same I had to modify the models.py (self.embeddings = nn.Parameter(torch.ones((5, self.dim))) ) in order to be accomodative of different atom types in possible ligands.

Below is the file I have made, it will be kind of you to take a look and let me know if I'm in right direction of generating atom level embeddings.

# Libraries
import torch
from rdkit.Chem import AllChem
from rdkit import Chem
from torch_geometric.data import `Data`
import os
import sys
import pandas as pd

from layers.global_message_passing import Global_MessagePassing
from models import PAMNet 
from datasets.qm9_dataset import QM9 

# Custom Config class adjusted to meet PAMNet's requirements
class Config:
    def __init__(self, dataset, dim, n_layer=3, cutoff_l=5.0, cutoff_g=10.0):
        self.dataset = dataset
        self.dim = dim
        self.n_layer = n_layer
        self.cutoff_l = cutoff_l
        self.cutoff_g = cutoff_g

#molecule to graph
def molecule_to_graph(molecule):
    mol = Chem.MolFromSmiles(molecule)
    if mol is None:
        return None

    # Add hydrogens
    mol = Chem.AddHs(mol)

    # Use RDKit to generate 3D coordinates
    if AllChem.EmbedMolecule(mol) == -1: 
        return None

    # Get positions from the first conformer
    pos = mol.GetConformer().GetPositions()
    pos = torch.tensor(pos, dtype=torch.float32)

    # Get atoms and bonds to create edges
    atoms = mol.GetAtoms()
    node_features = torch.tensor([atom.GetAtomicNum() for atom in atoms], dtype=torch.float32)
    edge_index = []

    for bond in mol.GetBonds():
        start, end = bond.GetBeginAtomIdx(), bond.GetEndAtomIdx()
        edge_index.append([start, end])
        edge_index.append([end, start])

    edge_index = torch.tensor(edge_index, dtype=torch.long).t().contiguous()
    batch = torch.zeros(len(atoms), dtype=torch.long)

    # Return the graph data including positions
    return Data(x=node_features, edge_index=edge_index, pos=pos, batch=batch)

def generate_embeddings(smiles, config):
    # Assuming PAMNet and necessary modules are correctly imported
    model = PAMNet(config)
    model.eval()  # Set the model to evaluation mode

  # Convert SMILES to graph data
    graph_data = molecule_to_graph(smiles)

    # Forward pass through the model to get embeddings
    with torch.no_grad():
        embeddings = model(graph_data)

    return embeddings

# Configure the model
config = Config("QM9", 128, 3, 5.0, 10.0)
df=pd.read_csv("/data/anjali/Physics-aware-Multiplex-GNN/smiles_180_ext.csv")
em=[]
for smiles in df['SMILES']:
   # smiles = "CCO"
    embeddings = generate_embeddings(smiles, config)
    print("Generated Embeddings:", smiles, embeddings)
    em.append(embeddings) 
zetayue commented 5 months ago

The code looks good. We didn't consider the hydrogens in the molecules. Besides, we used atom_to_feature_vector and bond_to_feature_vector in https://github.com/snap-stanford/ogb/blob/master/ogb/utils/features.py to include more features for atoms and bonds in chemical graphs. We assume they will bring gains than just using atomic numbers though we didn't do an ablation study here.

AnjaliSetiya commented 4 months ago

Thank you for the insight. But I still want to know do I have to modify the models.py (self.embeddings = nn.Parameter(torch.ones((5, self.dim))) ) in order to be accomodative of different atom types in possible ligands. As I am trying my hands on the code, if the ligands have new atom types like Br or S changing this parameter 5 helps, otherwise I get an error.