Open mathfirst opened 1 year ago
I tried to solve this problem with radius_graph
which is much faster than your original get_edges, but the subsequent sampling result showed that radius graph
may not be consistent with the result obtained by your original code.
from torch_cluster import radius_graph
Thanks for this suggestion! We initially tried to minimize dependencies but if radius_graph
is much faster, we should consider using it in the future.
Note that we don't remove self-loops in get_edges
. If you set loop=True
in radius_graph
, the behavior should be more similar to the original function.
Yes, I was aware of loop=True
. The problem is when I used radius_graph
, another error occured as follows.
ERROR: Element '0 F' not found
ERROR: moving to the beginning of the next molecule
File: DiffSBDD/analysis/molecule_builder.py", line 168, in process_molecule
mol = Chem.Mol(rdmol)
Boost.Python.ArgumentError: Python argument types in
Mol.__init__(Mol, NoneType)
did not match C++ signature:
__init__(_object*, RDKit::ROMol mol, bool quickCopy=False, int confId=-1)
__init__(_object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > pklString, unsigned int propertyFlags)
__init__(_object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > pklString)
__init__(_object*)
So I had to give up radius_graph
. Finally, I opt for np.where
as a bypass.
adj_np = np.where(adj.cpu())
edges = torch.stack((torch.from_numpy(adj_np[0]).to(adj.device), torch.from_numpy(adj_np[1]).to(adj.device)), dim=0)
The downside is that np.where
is very slow for a large problem.
The new error appears to be unrelated to the old one. Are you sure it was caused by radius_graph
?
Honestly, your code is very hard to run, because it needs a lot of memory. Even though I found a great GPU with large memory and trained your code for about 6 days, the sampling results are much worse than your first draft. So, not mention to the second manuscript. I am sorry to say that your reported results are not that convincing. Could you please make your code easy to run such that other researchers can test your results. By the way, I ran your code on CrossDocked dataset with full condition.
When I use your full_atom.ckpt to sample for the protein pockets in the test set of CrossDocked dataset, the following error occured.
Could you please fix it? Is the checkpoint for CrossDocked dataset available? Hopefully, it is trained on its training set.