MolSSI / QCFractal

A distributed compute and database platform for quantum chemistry.
https://molssi.github.io/QCFractal/
BSD 3-Clause "New" or "Revised" License
143 stars 47 forks source link

Creating a fast way to get dataset size #756

Closed j-wags closed 9 months ago

j-wags commented 10 months ago

I've recently needed to look at datasets by size (I'm looking for a small one to integrate into tests). Right now the best way to do it is

dss = client.list_datasets()
for ds_dict in dss:
    ds = client.get_dataset('singlepoint', ds_dict['dataset_name'])
    print(len([*ds.iterate_records()]), ds_dict['dataset_name'])

937 OpenFF Optimization Set 1 48280 OpenFF VEHICLe Set 1 189 OpenFF NCI250K Boron 1 ...

This eventually works, but it takes a long time. I'd love a quick way to directly query the size (for example, the number of records) in a dataset.

This isn't horribly urgent - The current method works fine and I only need to do it once - but I can imagine that many new users will stumble over this in the future.

bennybp commented 9 months ago

Fixed in #762