anhender / mse_ML_datasets

MIT License
15 stars 2 forks source link

Which datasets do you think might be appropriate for inclusion in matbench? #2

Open sgbaird opened 2 years ago

sgbaird commented 2 years ago

I.e. if you were to pick out the top few that you think are representative of materials informatics or materials discovery and why

anhender commented 2 years ago

I'll look into it more this week and get back to you by next week!

anhender commented 2 years ago

I think the datasets that relate to polymers would be useful to include in matbench because they don't seem to have any for that specific material system. A few examples would be the Wu datasets, Mannodi datasets, and the Pilania_Polymers data. In addition, since matbench only has one dataset for thermal application, adding the Seko_melt_temps data could be useful, as it provides 248 experimental melting temperatures for component solids.

I'm not sure how small of datasets they would want, but I would assume anything below 100 values may not be useful for matbench. They also seem to already have datasets for Bandgap and elastic moduli (G, K), so the Zhuo and Zeng datasets are probably unnecessary to add.

I can look into it a bit more if you would like, to find more datasets that I think would be good to add. This is what I have to start though!

sgbaird commented 2 years ago

Perfect, thanks for getting back on this!