distillpub / post--visual-exploration-gaussian-processes

A Visual Exploration of Gaussian Processes
http://distill.pub/2019/visual-exploration-gaussian-processes
101 stars 19 forks source link

Review #1 #1

Open distillpub-reviewers opened 6 years ago

distillpub-reviewers commented 6 years ago

The following peer review was solicited as part of the Distill review process.

The reviewer chose to keep anonymity. Distill offers reviewers a choice between anonymous review and offering reviews under their name. Non-anonymous review allows reviewers to get credit for the service them offer to the community.

Distill is grateful to the reviewer for taking the time to review this article.


This article was a delight to read, and hits many of the right notes in what I think makes a good Distill article:

What is unclear to me, however, is if this article breaks new ground in exposition. A lot of space e.g. is devoted to exposition on basic properties of the multivariate Gaussian distributions. Many of the figures correspond to visualizations that I already have in my head (though this is probably a good thing!), and the depth of the material on GPs themselves are not beyond what is covered in introductory courses. What I’m saying is that a reader already familiar with GPs is unlikely to take away anything from this. A new reader, however, who has not developed this visual “muscle”, would likely take away a lot. I will thus leave it to the editors to decide if Distill is an appropriate venue for such an article.

Personally, however, I think it is a solid piece of writing and thus I recommend it for publication.

Specific comments

[minor] based on prior knowledge -> incorporating prior knowledge

[minor] The statement [Gaussian processes assigns a probability to each of those functions] needs a citation.

[typo] symmetric -> symmetric

[nit] I am not a fan of the notation P(X,Y) used to denote the joint distribution of X and Y. This appears to be interchangeable with p_{X,Y}, used elsewhere in the article to refer to the joint density. These two notations refer to essentially the same thing.

[nit] Though it is clear from context what the article means, the article needs to be careful about distinguishing between probabilities and probability densities. e.g. in statements like “we are interested in the probability of X=x”

[minor] In the section “kernels”, L2-norm should be L2-distance

[minor] Periodic kernels are not invariant to translations but invariant to translations equal to the period of the function.

[major] Kernels can be combined by addition, multiplication and concatenation (see Chapter 5.4 of Mackay, Information Theory, Inference, and Learning Algorithms). The omission of addition as a way of combining kernels is surprising.

[minor] Furthermore, it would be nice to see an explanation of what it means when two kernels are combined. The resulting kernels inherit the qualitative properties of its constituents, but how, and why?

[major] There seems to be a problem in the figure right above “Conclusion”, when the linear kernel is used alone. The linear kernel has rank 2, and thus the distribution after conditioning is not well defined if there are more than 2 points observed. The widget, however, does not seem to complain when more than two points are added. What is going on? I suspect that the regression has been done with a noise model, i.e. with a small multiple of the identity added to fix the singularity. This is completely valid, but it should be discussed explicitly.

[major] To speak more broadly on this point, it is worth mentioning that GP regression is typically done in conjunction with a noise model. This allows the posterior some flexibility such that it does not need to pass through every observation (indeed, this constraint makes GPs nearly unusable). This addition requires very little extension to the GP framework and should be discussed explicitly.

[minor] I do think this is what the authors seem to be getting at in the conclusion, at the line ""If we need to deal with real-world data, we will often find measurements that are afflicted with uncertainty and errors."" The cited article, however, seems to be more sophisticated than what is needed to address the basic problem.

A final thought

I think the article could benefit from a hero figure that is a true “GP sandbox”. It would be nice to be able to combine kernels, change the parameters of the kernels, and add/remove data points in real time, all in one place.


Distill employs a reviewer worksheet as a help for reviewers.

The first three parts of this worksheet ask reviewers to rate a submission along certain dimensions on a scale from 1 to 5. While the scale meaning is consistently "higher is better", please read the explanations for our expectations for each score—we do not expect even exceptionally good papers to receive a perfect score in every category, and expect most papers to be around a 3 in most categories.

Any concerns or conflicts of interest that you are aware of?: No known conflicts of interest What type of contributions does this article make?: Explanation of existing results

Advancing the Dialogue Score
How significant are these contributions? 3/5
Outstanding Communication Score
Article Structure 4/5
Writing Style 5/5
Diagram & Interface Style 5/5
Impact of diagrams / interfaces / tools for thought? 3/5
Readability 4/5
Scientific Correctness & Integrity Score
Are claims in the article well supported? 4/5
Does the article critically evaluate its limitations? How easily would a lay person understand them? 3/5
How easy would it be to replicate (or falsify) the results? 3/5
Does the article cite relevant work? 3/5
Does the article exhibit strong intellectual honesty and scientific hygiene? 3/5
grtlr commented 6 years ago

Thank you very much for your detailed review! We are sorry for our late response - we agree with your feedback and will make changes to the article to reflect this. Currently, we are waiting for the other reviews before making major changes. Many of your specific comments should be straightforward to fix.

[major] To speak more broadly on this point, it is worth mentioning that GP regression is typically done in conjunction with a noise model. This allows the posterior some flexibility such that it does not need to pass through every observation (indeed, this constraint makes GPs nearly unusable). This addition requires very little extension to the GP framework and should be discussed explicitly.

[minor] I do think this is what the authors seem to be getting at in the conclusion, at the line ""If we need to deal with real-world data, we will often find measurements that are afflicted with uncertainty and errors."" The cited article, however, seems to be more sophisticated than what is needed to address the basic problem.

As you have mentioned, the noise model for the input is a very important aspect of Gaussian processes because this is what makes them practical. We agree that in the current version this is not covered adequately and we plan on adding an additional paragraph, together with a corresponding figure.

I think the article could benefit from a hero figure that is a true “GP sandbox”. It would be nice to be able to combine kernels, change the parameters of the kernels, and add/remove data points in real time, all in one place.

This sounds like a really interesting idea! Ideally, we would want to add an additional example that is more tangible (e.g. temperature data) and also has a larger number of training points. In that regard, we need to see what is possible with our current JavaScript implementation.

grtlr commented 5 years ago

PR #2 fixes some of the comments of R1:

[minor] based on prior knowledge -> incorporating prior knowledge

[minor] The statement [Gaussian processes assigns a probability to each of those functions] needs a citation.

[typo] symmetric -> symmetric

[minor] In the section “kernels”, L2-norm should be L2-distance

I will make additional changes in the next couple of days.

grtlr commented 5 years ago

Commit 48aa6441904f6e82150924f970ff999296048960 fixes the following:

[nit] I am not a fan of the notation P(X,Y) used to denote the joint distribution of X and Y. This appears to be interchangeable with p_{X,Y}, used elsewhere in the article to refer to the joint density. These two notations refer to essentially the same thing.

Commit 5752d96144c8535c7d564367389abb59de0f07de fixes the following:

[nit] Though it is clear from context what the article means, the article needs to be careful about distinguishing between probabilities and probability densities. e.g. in statements like “we are interested in the probability of X=x”

grtlr commented 5 years ago

[major] To speak more broadly on this point, it is worth mentioning that GP regression is typically done in conjunction with a noise model. This allows the posterior some flexibility such that it does not need to pass through every observation (indeed, this constraint makes GPs nearly unusable). This addition requires very little extension to the GP framework and should be discussed explicitly.

[minor] I do think this is what the authors seem to be getting at in the conclusion, at the line ""If we need to deal with real-world data, we will often find measurements that are afflicted with uncertainty and errors."" The cited article, however, seems to be more sophisticated than what is needed to address the basic problem.

PR #18 adds more information about extending Gaussian processes to input noise.

RKehlbeck commented 5 years ago

[minor] Periodic kernels are not invariant to translations but invariant to translations equal to the period >of the function.

[major] Kernels can be combined by addition, multiplication and concatenation (see Chapter 5.4 of Mackay, Information Theory, Inference, and Learning Algorithms). The omission of addition as a way of combining kernels is surprising.

[minor] Furthermore, it would be nice to see an explanation of what it means when two kernels are combined. The resulting kernels inherit the qualitative properties of its constituents, but how, and why?

PR #11 adresses the above issues, adds additional information on kernels, and includes a new static figure.