Closed carbondrop-nick closed 1 month ago
i am working on the open-source thing, make everything better!
Hi,
Super exciting work! Will just re-iterate the above. Would also be helpful in general to not import all functions from a script (from file import *
) since it makes it harder to follow where things break when/if they do.
Thanks Peter! I am cleaning everything. Plan to release everything in Nov.
things should be fixed, let me know if you can run enzymeflow_demo.ipynb
Thanks! I think we're pretty close, but I am running into a few key naming issues in the pretrain. It looks like ProteinLigandNetwork is expecting some slightly different keys than the ones present: Missing key(s) in state_dict: "guide_ligand_mpnn.mpnn.atom_convs.0.lin.weight", "guide_ligand_mpnn.mpnn.atom_convs.1.lin.weight", "guide_ligand_mpnn.mpnn.mol_conv.lin.weight" Unexpected key(s) in state_dict: "guide_ligand_mpnn.mpnn.atom_convs.0.lin_src.weight", "guide_ligand_mpnn.mpnn.atom_convs.0.lin_dst.weight", "guide_ligand_mpnn.mpnn.atom_convs.1.lin_src.weight
This may be a matter of the right pyg version that leads to different model implementations. installing torch_geometric with pip and the specified version seems to work.
Well spotted! Indeed I was working with a later version of torch_geometric.
Looks like
meta_eval_csv = pd.read_csv('data/metadata_eval.csv')
is referring to a flat file that includes absolute paths to Will's computer instead of relative paths:
/Users/willhua/Desktop/EnzymeFlow/data/processed_eval/msa/P07964/P07964.pkl
I edited the file to remove all instances of /Users/willhua/Desktop/EnzymeFlow/ and it seemed to work fine
I am working on the multi motif scaffolding, i.e., enzymeflow generate enzyme motifs, I am trying to find the seq_idx or seq_position that maps the motifs back to the whole enzyme. Let me know you have suggestions, or would like to collaborate on this.
A good answer would be pretty far beyond me. Enzyme "motif" is a messy concept since catalytic machinery has to be able to access multiple states, making hidden dynamic trajectories important. I would stick to the hidden representation and just try to consider the "foldiness" of the protein separately from its "enzyminess" for a given chemical reaction (enzyminess = 0 for most reactions, >0 for known reaction(s)), but that is much easier said than done.
Hi Will - I was excited to read your paper, I have recently been interested in exploring a similar concept in my studies. I am trying to get the demo notebook running, but am running into some dependency issues. Specifically 'module geomstats.backend has no attribute 'cos'. This is being attempted to be imported in the geomstats.algebura_utils file. I am not able to find where this 'cos' function is. Any advice would be appreciated.
Haven't seen this issue myself, but geomstats should be using the cosine function ('cos') from numpy. Your error likely means there is either an issue with the configuration of geomstats (e.g. it might be pulling function names from PyTorch instead of numpy) or an issue with numpy itself. I would try reinstalling numpy and geomstats or set up a new environment.
Hi, are you using geomstats==2.7.0 or other versions? Perhaps you are importing the function from numpy or other source.
fixed this issue now - thanks both! I indeed was using another version of geomstats. I reinstalled the correct version into my env.
Super cool, but hard to get going. On top of the stated dependencies I also had to add a lot of modules to my environment to get this working: ipdb einops ml-collections dm-tree ipywidgets jupyter einx torch_geometric tmtools openmm POT rdkit mdtraj pdbfixer
Seems to be missing "sampling" module. The imported configs, inference, and data.loader point to modules that don't exist (unless you meant the ones under Pretrain?)
mdtraj also requires C++ tools to install and pdbfixer can't be installed via pip (conda-forge instead). Would be great to find a better way to distribute this.