Closed btjanaka closed 4 years ago
Thank you for your detailed report. This might be related to #435. @dgasmith is the expert on this, but he's traveling today.
I would suggest updating to the latest qcportal (0.13.0), and seeing if that resolves the issue.
Taking your first example it appears to be correct:
>>> proc = client.query_procedures(id=17247212)[0]
>>> proc.keywords.scans[0].indices
[4, 13, 2, 3]
>>> client.query_molecules(id=proc.initial_molecule)[0].geometry.shape
(14, 3)
Please note that query order isn't guaranteed which is where items are going wrong in your code I believe. I highly recommend using queries like ds.query("default")
which handle the ordering gracefully and correctly.
Thanks for the quick response from both of you! I am wondering, what is ds.query("default")
supposed to do? When I use it, all I get back is the string default
. I have read the docstring for the method, and this seems to make sense, but I am not sure what I should be doing with this output.
We should probably change that, but ds.query('default')
populates the underlying ds.df
DataFrame so that get_entry
1) becomes very fast and 2) correctly maps the entries to a record.
We should change the return to return df[spec.name]
here.
@btjanaka Could you make a PR if you think that would make it more straightforward?
@dgasmith will do, thanks for the help!
Closed by #565.
As a followup, everything worked out when I tried again with ds.query
. Here is the updated code:
import numpy as np
import qcportal as ptl
from openeye import oechem, oedepict
from tqdm.notebook import tqdm
# Loading Data
client = ptl.FractalClient()
ds = client.get_collection('GridOptimizationDataset',
"OpenFF Trivalent Nitrogen Set 3")
ds.query("default")
# Looking at the restraint indices in each record, and seeing whether they are
# available in the molecule
for index in tqdm(ds.df.index):
ds_entry = ds.get_entry(index)
record = client.query_procedures(ds_entry.object_map['default'])[0]
molecule = client.query_molecules(ds_entry.initial_molecule)[0]
smiles = ds_entry.attributes["canonical_explicit_hydrogen_smiles"]
restraint_indices = np.array(record.keywords.scans[0].__dict__['indices'])
num_atoms = len(molecule.symbols)
# If any restraint index is out of bounds
if np.any(restraint_indices >= num_atoms):
print("--------------------------------------")
print("GridOptRecord id :", record.id)
print("SMILES :", smiles)
print("Restraint indices:", restraint_indices)
print("Number of atoms :", num_atoms)
mol = oechem.OEMol()
oechem.OESmilesToMol(mol, smiles)
idx_to_symbol = dict(
(atom.GetIdx(), oechem.OEGetAtomicSymbol(atom.GetAtomicNum()))
for atom in mol.GetAtoms())
print("Restraint atoms in OEMol:",
[idx_to_symbol[idx] for idx in restraint_indices])
Awesome, that looks much more in line with what I hope our API can deliver.
Although, I highly recommend something like ds.get_record(entry_name, 'default')
. Currently you will query the records twice with your upper level ds.query
call.
Describe the bug
I am trying to access the atoms involved in the optimizations in the "OpenFF Trivalent Nitrogen Set 3" dataset. I am able to access the indices, and I am able to access the molecule, but I find that the indices are out of bounds; i.e. the molecule seems to be lacking atoms. For instance, one of the molecules might have restraint indices [0, 13, 2, 1] but the molecule would only have 9 atoms, whereas it would need at least 14 here (due to zero-based indexing).
I have tried converting the SMILES for the molecule to Openey's oechem.OEMol and accessing the atoms in the resulting molecule, but the indices do not seem to match up with those in the qcelemental molecule. I think the atom at the first restraint index should be the trivalent nitrogen, but this is not always the case when using the OEMol.
To Reproduce
The following code prints out the molecules that I have been having issues with.
Expected behavior
The restraint indices should correspond to the correct atoms in the molecules from the dataset, and the output for the code above should be empty. Instead, I am getting the following output:
Additional context
Python 3.7.3, Ubuntu 18.04
Library versions
qcelemental==0.11.1
qcengine==0.11.0
qcfractal==0.11.0
qcportal==0.12.1