fabsig / GPBoost

Combining tree-boosting with Gaussian process and mixed effects models
Other
574 stars 46 forks source link

Some questions about friedman3 instance code #49

Closed gaofenxia closed 2 years ago

gaofenxia commented 2 years ago

Through understanding friedman3 dataset, I found that it has four independent feature columns, and the relationship between output y and input x is as follows (this relationship already exists when the dataset is created): y(X) = arctan((X[:, 1] X[:, 2] - 1 / (X[:, 1] X[:, 3])) / X[:, 0]) + noise * N(0, 1) In the friedman3 instance code: X, F = datasets. make_ friedman3(n_samples=n) So can it be understood that F here is the y column in friedman3 dataset, and y (X) acts as the mean function F (x)? If so, why is F (x) (or y (X)) in this form? In addition, in the friedman3 instance, there are the following codes: y = F + Zb + xi # observed data So why is y here called observed data? Isn't it calculated by F, Zb and xi? Please explain in detail, thank you!

fabsig commented 2 years ago

Yes, the simulated function corresponds to F(X) which, for the Gaussian case, is the (prior) mean function of the observed variable y. In the simulated experiments you mention, the observed data equals the sum y = F + Zb + xi. I do not fully understand your question, but if it bothers you that I call something "observed data" when in fact it is simulated data, feel free to use another name for it. Similarly, F(X) is just notation.