CompRhys / aviary

The Wren sits on its Roost in the Aviary.
MIT License
48 stars 11 forks source link

Is there matbench benchmark result for Wrenformer? #72

Open hongshuh opened 1 year ago

hongshuh commented 1 year ago

I saw in the commit history that you have conducted some experiments in matbench benchmark, It's a very good idea and model, but I may not have enough computational resources to run it, I would like to know if you have final resuts?

janosh commented 1 year ago

I believe @hrushikesh-s is currently working on submitting Wrenformer to Matbench. Maybe he can teel you more.

In case you haven't seen there are some preliminary results for various Wrenformer hyperparameter settings plotted in #44.

janosh commented 1 year ago

@hongshuh Also, what are you planning on using Wrenformer for? If discovery, these results might interest you: https://matbench-discovery.materialsproject.org/preprint#results.

hongshuh commented 1 year ago

Yea, I am also following the discorvery benchmark, It seems to handle the task as a regression problem by predicting the energy above hull, rather than treating it as a classification task of identifying whether a material is stable or not. I am a bit puzzled by this approach, since the aim seems to be the identification of stable materials, which would intuitively seem to be a classification task.

janosh commented 1 year ago

It seems to handle the task as a regression problem by predicting the energy above hull

That's right.

rather than treating it as a classification task of identifying whether a material is stable or not. I am a bit puzzled by this approach since the aim seems to be the identification of stable materials, which would intuitively seem to be a classification task.

I have some preliminary results which suggest doing direct classification does not improve over regression. But I think that's definitely something that could be investigated further. If you want to check how well a Wrenformer stability classifier performs compared to the Wrenformer regressor, that would be a very welcome contribution to MBD!

janosh commented 1 year ago

This section from Bartel et al. 2021 is also relevant here:

As an additional demonstration, all representations (except Roost—see “Methods” for details) were also trained as classifiers (instead of regressors), tasked with predicting whether a given compound is stable (ΔHd ≤ 0) or unstable (ΔHd > 0). The accuracies, F1 scores, and false positive rates are tabulated in Supplementary Table 2 and found to be only slightly better (accuracies < 80%, F1 scores < 0.75, false positive rates > 0.15) than those obtained by training on ΔHf (Fig. 4) or ΔHd (Supplementary Fig. 4).

Here's Table S2:

Screenshot 2023-05-20 at 11 00 48

hongshuh commented 1 year ago

I have some preliminary results which suggest doing direct classification does not improve over regression.

Thanks! Maybe the regression values provide more information to the model than just "stable" or "unstable" labels.