MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
144 stars 47 forks source link

Some datasets have placeholder spec names after migration #739

Open j-wags opened 12 months ago

j-wags commented 12 months ago

Describe the bug

Some datasets have had their spec names replaced by strings like spec_1, spec_2, etc in the new QCA server.

To Reproduce

In QCFractal 0.15:

from qcportal import FractalClient
client = FractalClient()
ds = client.get_collection("Dataset", "OpenFF Theory Benchmarking Single Point Energies v1.0")
spec_names = [h[4] for h in ds.data.history]
print(spec_names)

outputs

['b3lyp-d3bj/def2-qzvp', 'b3lyp-d3bj/6-311+g**', 'wb97m-d3bj/dzvp', 'm05-2x-d3/dzvp', 'default', 'b97-d3bj/def2-tzvp', 'm08-hx-d3/dzvp', 'dsd-blyp-d3bj/heavy-aug-cc-pvtz', 'pw6b95-d3/dzvp', 'b3lyp-d3bj/def2-tzvpd', 'gfn1xtb', 'wb97m-v/dzvp', 'pw6b95-d3bj/dzvp', 'wb97m-d3bj/dzvp', 'm06-2x-d3/dzvp', 'pw6b95-d3bj/dzvp', 'mp2/heavy-aug-cc-pv(t+d)z', 'b3lyp-d3bj/def2-tzvp', 'wb97x-d3bj/dzvp', 'b3lyp-d3bj/6-31+g**', 'mp2/aug-cc-pvtz', 'b3lyp-nl/dzvp', 'gfn2xtb', 'b3lyp-d3bj/6-311+g**', 'b3lyp-d3bj/def2-tzvpp', 'b3lyp-d3bj/def2-tzvppd', 'm05-2x-d3/dzvp', 'm08-hx-d3/dzvp', 'wb97m-d3bj/dzvp', 'b97-d3bj/def2-tzvp', 'b3lyp-d3bj/def2-tzvppd', 'pw6b95-d3/dzvp', 'b3lyp-d3bj/def2-tzvpp', 'gfnff', 'b97-d3bj/def2-tzvp', 'b3lyp-d3mbj/dzvp', 'default', 'm06-2x-d3/dzvp', 'b3lyp-d3bj/def2-qzvp', 'wb97x-d3bj/dzvp', 'ani2x', 'df-ccsd(t)/cbs', 'b3lyp-d3bj/def2-tzvp', 'b3lyp-d3bj/6-31+g**', 'b3lyp-d3mbj/dzvp', 'b3lyp-d3bj/def2-tzvpd', 'dsd-blyp-d3bj/heavy-aug-cc-pvtz']

Using QCPortal 0.50, I believe the equivalent code is:

from qcportal import PortalClient
client = PortalClient()
ds = client.get_dataset("singlepoint", "OpenFF Theory Benchmarking Single Point Energies v1.0")
print(ds.specification_names)

which outputs

['spec_5', 'spec_26', 'spec_46', 'spec_47', 'spec_6', 'spec_14', 'spec_37', 'spec_41', 'spec_24', 'spec_19', 'spec_28', 'spec_11', 'spec_39', 'spec_38', 'spec_9', 'spec_45', 'spec_40', 'spec_44', 'spec_36', 'spec_8', 'spec_2', 'spec_34', 'spec_22', 'spec_27', 'spec_30', 'spec_33', 'spec_35', 'spec_7', 'spec_4', 'spec_43', 'spec_18', 'spec_13', 'spec_3', 'spec_10', 'spec_31', 'spec_15', 'spec_29', 'spec_21', 'spec_42', 'spec_16', 'spec_20', 'spec_12', 'spec_25', 'spec_23', 'spec_1', 'spec_17', 'spec_32']

Additional notes It seems like the spec lookup logic continued to work fine for the optimization and torsiondrive datasets in our testing, so this may just be a problem with migrating single point datasets.

bennybp commented 12 months ago

IIRC, the previous version, the singlepoint datasets had 'aliases', but these only referred to sets of keywords rather than a whole specification. So I had to have placeholders for the whole specification name.

Specifications can be renamed (ds.rename_specification()) so for formulaic specification names, you can write a script that does this automatically. I can help with this if you would like

j-wags commented 12 months ago

Thanks. Would you be open to a script that replaces the placeholder specifications for relevant OpenFF datasets on the central QCArchive, or should I do it client-side?

bennybp commented 12 months ago

Client-side would be sufficient. Renaming a specification is fast, and it can handle doing a bunch of them

j-wags commented 12 months ago

Ah, I may not have asked clearly - Are you willing to have these spec names changed in QCArchive itself, or do you plan to stick with the placeholder names? If it's the former I'll put together a script to do the renaming on QCA. If it's the latter I'll need to provide utilities within OpenFF to make our workflows continue working with the placeholder names.

bennybp commented 11 months ago

oh feel free to rename them on the server itself. They are your datasets after all :)

j-wags commented 11 months ago

Excellent - Thanks!

j-wags commented 11 months ago

Update: I'll still take this on, but after I get QCSubmit updated!