Inconsistencies in Dataset Counts Across Different Attributes

google-deepmind / materials_discovery

Apache License 2.0

872 stars 138 forks source link

Inconsistencies in Dataset Counts Across Different Attributes #14

Open HarshaSatyavardhan opened 9 months ago

HarshaSatyavardhan commented 9 months ago

I have downloaded the dataset and the number of datapoints keep on changing

by conductivity - 377223
by id - 384939
by reduced_formula - 377184

why their is huge number difference in the cif's in these particular folders ?

ml-evs commented 8 months ago

I understand they are screening some erroneous structures from the initial dataset, I archived the initially published version as an OPTIMADE API at https://optimade-gnome.odbx.science/v1/structures.