Closed dgasmith closed 4 years ago
Hi Daniel, I attached the new ASCDB.csv file, and a new commented version of the script (add_database). They should work on your end too, then we can think about extending it for all the other databases in ACCDB. Let me know.
Thanks! Do you feel like this is ready to be applied to the main QCArchive repository?
I think it is ready to go, unless you want me to be more specific with the level of theory for each datapoint. I can add a keyword after the reference energies on the .csv file, re-write the script to add that modification and attach it here. What do you think?
A couple questions:
Matt, Yes, the points have different levels of theory based on the original database they come from (none of them come from my/our calculations). I have info about method and basis set for sure, but I need to check the original articles for the programs and their keywords. What kind of keywords (besides CP, or whether they have stability analysis or not) do you need?
Are all of the benchmark data "gold standard"? Are the differences between the benchmark methods employed much smaller than the difference to a test method (e.g. DFT)? Basically, I'm trying to ascertain if having the specific level of theory data would be worth the effort (or even useful to the user), or if we can just label all of the values as "benchmark" and move on.
The main goal of our database was to have all datapoints in a level of theory higher than DFT. To answer your question, yes, they are all gold standard. I asked if you wanted to incorporate the details because I saw you did it for the other databases in the collection. In my opinion it is not necessary, so you can incorporate the data I sent you as "benchmark" and I would be fine with it.
Okay, that sounds like a plan. I see that contrib["theory_level"]
is CCSD(T). Are all of the benchmark data CCSD(T)-based?
I double-checked, and 160 datapoints (out of 200) are CCSD(T)-based (most of them at the Wn protocol), 20 are CAS-SCF/CASPT2 and 20 (transition metal compounds) come from corrected experimental values. If you need me to be more detailed, I'll attach a new script and a new .csv file that takes care of this issue.
Do you have a paper for ASCDB?
Yes, it's P. Morgante, R. Peverati, "Statistically representative databases for density functional theory via data science", Phys. Chem. Chem. Phys. 2019, 21(35), 19092–19103. DOI:10.1039/C9CP03211H.
Describe the data you'd like Upload the ACCDB dataset found here.
Describe ways to obtain the data Download data from the ACCDB repository.
Willing to contribute Pier Morgante should be able to help with the ingestion. MolSSI will help with some compute.
Additional context
Upload Example
```python import qcportal as ptl from qcfractal import FractalSnowflake import pandas as pd SNOWFLAKE = True if SNOWFLAKE: snowflake = FractalSnowflake() client = snowflake.client() else: client = None print(client) ds = ptl.collections.ReactionDataset("ASCDB", client=client) with open("ASCDB.csv", "r") as handle: rxns = [x.split(",") for x in handle.read().splitlines()] gpath = "ASCDB_Geometries" contrib_name = [] contrib_value = [] for row in rxns[:5]: name = row[0] rxn = row[1:] half = len(rxn) // 2 molecules = rxn[:half] coefs = rxn[half:] rxn_data = [] for mol_name, coef in zip(molecules, coefs): mol = ptl.Molecule.from_file(gpath + "/" + mol_name + ".xyz") coef = float(coef) rxn_data.append((mol, coef)) rxn = {"default": rxn_data} ds.add_rxn(name, rxn) contrib_name.append(name) contrib_value.append(5) ds.save() contrib = { "name": "Benchmark", "theory_level": "CCSD(T)", "values": contrib_value, "index": contrib_name, "theory_level_details": {"driver": "energy"}, "units": "hartree", } ds.add_contributed_values(contrib) #ds.save() ```