MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
148 stars 48 forks source link

Wrong typecasting in records #766

Open chrisiacovella opened 1 year ago

chrisiacovella commented 1 year ago

Describe the bug

As I mentioned in the meeting the other day, I came across what I think is a few bugs in the records for the following single point datasets on the ml server for the spice datasets. It seems to specifically be impacting "spec_6" data, for the following properties:

current energy <class 'str'> dispersion correction energy <class 'str'> 2-body dispersion correction energy <class 'str'> b3lyp-d3(bj) dispersion correction energy <class 'str'>

For this dataset, it appears those 4 properties all store the same energy (and it is identical to 'return_energy' which is properly typed as a float). I'll note the lists of value (e.g., the fields related to gradients) are constructed correctly of floats.

The following datasets have this issue for spec_6

SPICE Solvated Amino Acids Single Points Dataset v1.0 spec_6 SPICE DES Monomers Single Points Dataset v1.0 spec_6 SPICE PubChem Set 1 Single Points Dataset v1.0 spec_6 SPICE Dipeptides Single Points Dataset v1.0 spec_6 SPICE PubChem Set 2 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 3 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 5 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 6 Single Points Dataset v1.0 spec_6 SPICE PubChem Set 1 Single Points Dataset v1.1 spec_6 SPICE DES Monomers Single Points Dataset v1.1 spec_6 SPICE Dipeptides Single Points Dataset v1.1 spec_6 SPICE Pubchem Set 4 Single Points Dataset v1.0 spec_6 SPICE Solvated Amino Acids Single Points Dataset v1.1 spec_6 SPICE DES370K Single Points Dataset v1.0 spec_6 SPICE PubChem Set 1 Single Points Dataset v1.2 spec_6 SPICE Dipeptides Single Points Dataset v1.2 spec_6 SPICE DES370K Single Points Dataset Supplement v1.0 spec_6 SPICE PubChem Set 2 Single Points Dataset v1.2 spec_6 SPICE PubChem Set 3 Single Points Dataset v1.2 spec_6 SPICE Pubchem Set 4 Single Points Dataset v1.2 spec_6 SPICE PubChem Set 5 Single Points Dataset v1.2 spec_6 SPICE Ion Pairs Single Points Dataset v1.0 spec_6 SPICE PubChem Set 6 Single Points Dataset v1.2 spec_6 SPICE Ion Pairs Single Points Dataset v1.1 spec_6

To Reproduce

Just a quick code to loop over everything.

from qcportal import PortalClient
client = PortalClient("ml.qcarchive.molssi.org")
dataset_type = "singlepoint"

datasets = client.list_datasets()

datasets_to_consider = [] 
for dataset in datasets:
    if dataset['dataset_type'] == 'singlepoint':
        if 'SPICE' in dataset['dataset_name']:
            datasets_to_consider.append(dataset['dataset_name'])

spec = 'spec_6'
for dataset_name in datasets_to_consider:
    ds = client.get_dataset(
                dataset_type=dataset_type, dataset_name=dataset_names[0]
            )

    entry_names = ds.entry_names

    max_val = 1

    for record in ds.iterate_records(entry_names[0:max_val], specification_names=[spec]):
        has_strings = False
        for k in record[2].dict()['properties'].keys():
            if isinstance(record[2].dict()['properties'][k], str):
                has_strings = True
                #print(k, type(record[2].dict()['properties'][k]))
        if has_strings:
            print(f'{dataset_name} {spec}')
bennybp commented 1 year ago

This seems to apply only to DFTD3 calculations, where the values are converted to strings: https://github.com/MolSSI/QCEngine/blob/1b27a14255817f13092ae846593b0fb7c975625b/qcengine/programs/dftd3.py#L273C41-L273C41

@loriab is looking to clean that up in qcengine soon. I can convert the existing values in the database next week.

(The DFTD3 calculations come from specifying b3lyp-d3 calculations. In the legacy version, this caused two separate records/specifications to be created - one for b3lyp and one for the d3 correction. The new version makes these existing records explicit, but no longer does the splitting for new calculations. It's a bit complicated...)