MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
144 stars 47 forks source link

Can dataset `get_records` drop missing records #706

Closed jthorton closed 11 months ago

jthorton commented 2 years ago

Is your feature request related to a problem? Please describe. It would be helpful if the dataset get_records method could drop missing records when using the status keyword as currently, it returns the index and NaN for the record even when I request only complete records. This would stop me from having to filter the query result again.

Some code to reproduce

from qcportal import FractalClient
client = FractalClient()
ds = client.get_collection("dataset", "OpenFF BCC Refit Study COH v1.0")
dataset_specs = {
                spec: {"method": method, "basis": basis, "program": program}
                for _, program, method, basis, spec in ds.data.history
            }
query = ds.get_records(**dataset_specs["resp-2-vacuum"], status=["complete"])
query

CC(=O)c1ccccc1-0 ResultRecord(id='32651733', status='COMPLETE') ... ... C(C#N)SC1=NN=C(N1)N NaN CCCCSC1=NN=C(S1)N-0 NaN CCCCSC1=NN=C(S1)N-1 NaN CCCCSC1=NN=C(S1)N-2 NaN CCCC1CCCNC1 NaN

Describe the solution you'd like To only return records with the requested status and to drop not yet computed records.

Describe alternatives you've considered I can filter the query myself but I think it would be faster to do this during the database search particularly for larger datasets and would mean sending less data back to the user.

Additional context

bennybp commented 11 months ago

Getting records in a singlepoint dataset has completely changed in v0.50, so this isn't applicable anymore