I've recently needed to look at datasets by size (I'm looking for a small one to integrate into tests). Right now the best way to do it is
dss = client.list_datasets()
for ds_dict in dss:
ds = client.get_dataset('singlepoint', ds_dict['dataset_name'])
print(len([*ds.iterate_records()]), ds_dict['dataset_name'])
937 OpenFF Optimization Set 1
48280 OpenFF VEHICLe Set 1
189 OpenFF NCI250K Boron 1
...
This eventually works, but it takes a long time. I'd love a quick way to directly query the size (for example, the number of records) in a dataset.
This isn't horribly urgent - The current method works fine and I only need to do it once - but I can imagine that many new users will stumble over this in the future.
I've recently needed to look at datasets by size (I'm looking for a small one to integrate into tests). Right now the best way to do it is
This eventually works, but it takes a long time. I'd love a quick way to directly query the size (for example, the number of records) in a dataset.
This isn't horribly urgent - The current method works fine and I only need to do it once - but I can imagine that many new users will stumble over this in the future.