WillHua127 / EnzymeFlow

Official repository of EnzymeFlow
https://arxiv.org/abs/2410.00327
Other
54 stars 11 forks source link

Dependency issues #1

Closed carbondrop-nick closed 1 month ago

carbondrop-nick commented 1 month ago

Super cool, but hard to get going. On top of the stated dependencies I also had to add a lot of modules to my environment to get this working: ipdb einops ml-collections dm-tree ipywidgets jupyter einx torch_geometric tmtools openmm POT rdkit mdtraj pdbfixer

Seems to be missing "sampling" module. The imported configs, inference, and data.loader point to modules that don't exist (unless you meant the ones under Pretrain?)

mdtraj also requires C++ tools to install and pdbfixer can't be installed via pip (conda-forge instead). Would be great to find a better way to distribute this.

WillHua127 commented 1 month ago

i am working on the open-source thing, make everything better!

pgmikhael commented 1 month ago

Hi,

Super exciting work! Will just re-iterate the above. Would also be helpful in general to not import all functions from a script (from file import *) since it makes it harder to follow where things break when/if they do.

WillHua127 commented 1 month ago

Thanks Peter! I am cleaning everything. Plan to release everything in Nov.

WillHua127 commented 1 month ago

things should be fixed, let me know if you can run enzymeflow_demo.ipynb

carbondrop-nick commented 1 month ago

Thanks! I think we're pretty close, but I am running into a few key naming issues in the pretrain. It looks like ProteinLigandNetwork is expecting some slightly different keys than the ones present: Missing key(s) in state_dict: "guide_ligand_mpnn.mpnn.atom_convs.0.lin.weight", "guide_ligand_mpnn.mpnn.atom_convs.1.lin.weight", "guide_ligand_mpnn.mpnn.mol_conv.lin.weight" Unexpected key(s) in state_dict: "guide_ligand_mpnn.mpnn.atom_convs.0.lin_src.weight", "guide_ligand_mpnn.mpnn.atom_convs.0.lin_dst.weight", "guide_ligand_mpnn.mpnn.atom_convs.1.lin_src.weight

pgmikhael commented 1 month ago

This may be a matter of the right pyg version that leads to different model implementations. installing torch_geometric with pip and the specified version seems to work.

carbondrop-nick commented 1 month ago

Well spotted! Indeed I was working with a later version of torch_geometric.

carbondrop-nick commented 1 month ago

Looks like meta_eval_csv = pd.read_csv('data/metadata_eval.csv') is referring to a flat file that includes absolute paths to Will's computer instead of relative paths: /Users/willhua/Desktop/EnzymeFlow/data/processed_eval/msa/P07964/P07964.pkl I edited the file to remove all instances of /Users/willhua/Desktop/EnzymeFlow/ and it seemed to work fine

WillHua127 commented 1 month ago

I am working on the multi motif scaffolding, i.e., enzymeflow generate enzyme motifs, I am trying to find the seq_idx or seq_position that maps the motifs back to the whole enzyme. Let me know you have suggestions, or would like to collaborate on this.

carbondrop-nick commented 1 month ago

A good answer would be pretty far beyond me. Enzyme "motif" is a messy concept since catalytic machinery has to be able to access multiple states, making hidden dynamic trajectories important. I would stick to the hidden representation and just try to consider the "foldiness" of the protein separately from its "enzyminess" for a given chemical reaction (enzyminess = 0 for most reactions, >0 for known reaction(s)), but that is much easier said than done.

JSATacoTruck commented 1 month ago

Hi Will - I was excited to read your paper, I have recently been interested in exploring a similar concept in my studies. I am trying to get the demo notebook running, but am running into some dependency issues. Specifically 'module geomstats.backend has no attribute 'cos'. This is being attempted to be imported in the geomstats.algebura_utils file. I am not able to find where this 'cos' function is. Any advice would be appreciated.

carbondrop-nick commented 1 month ago

Haven't seen this issue myself, but geomstats should be using the cosine function ('cos') from numpy. Your error likely means there is either an issue with the configuration of geomstats (e.g. it might be pulling function names from PyTorch instead of numpy) or an issue with numpy itself. I would try reinstalling numpy and geomstats or set up a new environment.

WillHua127 commented 1 month ago

Hi, are you using geomstats==2.7.0 or other versions? Perhaps you are importing the function from numpy or other source.

JSATacoTruck commented 1 month ago

fixed this issue now - thanks both! I indeed was using another version of geomstats. I reinstalled the correct version into my env.