Bin-Cao / Bgolearn

A Bayesian global optimization package for material design | Adaptive Learning | Active Learning
http://bgolearn.caobin.asia/
MIT License
82 stars 14 forks source link

The approach of obtaining noise level as an input for the alpha parameter did not yield satisfactory model accuracy. #2

Open zifengdexiatian opened 10 months ago

zifengdexiatian commented 10 months ago

在实际测试中发现,构建的高斯过程回归模型,无论是交叉验证还是留一法验证,模型的精度均较差,决定系数R2甚至低于0.2。作者的模型构建函数fit_pre()中,对参数alpha的处理是先利用高斯过程回归估计噪声水平,将噪声水平作为alpha输入,最后构建最终的GPR模型。 我在实际测试中发现,这种方式构建的模型精度很低。把建模逻辑改为自动调参后,模型精度上升至R2>0.7。我对高斯过程回归原理并不是很熟悉,但测试结果发现自动调参确实优于噪声估计的方法。并且发现自动调参找到的alpha值与噪声估计的alpha值相差巨大。 我不确定是否是因为数据集比较特殊的原因,该结果没有在其它数据集进行测试。

During practical testing, it was found that the constructed Gaussian process regression (GPR) models, whether through cross-validation or leave-one-out validation, had poor accuracy, with a coefficient of determination (R2) even lower than 0.2. In the author's model construction function, fit_pre(), the handling of the parameter "alpha" involved estimating the noise level using Gaussian process regression and using this estimated noise level as an input for alpha to construct the final GPR model. In my practical testing, I found that this approach resulted in low model accuracy. However, after modifying the modeling logic to incorporate automatic parameter tuning, the model accuracy improved with an R2 greater than 0.7. While I am not very familiar with the principles of Gaussian process regression, the test results indicated that automatic parameter tuning was indeed superior to noise estimation. Furthermore, it was observed that the alpha values obtained through automatic tuning differed significantly from those estimated using noise estimation. I am unsure whether this discrepancy is due to the particular characteristics of the dataset, as these results have not been tested on other datasets.

Bin-Cao commented 10 months ago

Thank you for providing your feedback.

Firstly, kindly check the downloaded version of Bgolearn on your computer. We recommend using version 2.2.2 or above. In versions post 2.2.2, we have centralized the input to mitigate potential influences of the original input dimensions on the GPR. This ensures that the regression effect is not unduly affected by the scale of inputs.

Secondly, I suggest incorporating the noise through parameter, alpha. Since 'automatic parameter tuning' modifies the kernel function, I recommend referring to the book 'Gaussian Processes for Machine Learning' by Carl Edward Rasmussen and Christopher K. I. Williams (MIT), specifically Page 16, Eq. (2.21). Noise is introduced in the K(X, X) covariance matrix and sometimes also in K(X, X). However, it should not be added to K(X, X) and K(X, X). If you choose to revise the kernel function, it will introduce noise simultaneously to all four matrices.

If you have any further questions, please feel free to ask.

Bin-Cao commented 10 months ago

Book Link : https://github.com/Bin-Cao/Bgolearn/blob/main/RefBookOfGaussianProcess/BOOK-Gaussian%20processes%20for%20machine%20learning_Rasmussen_Williams-2006.pdf