[x] Section 1.1: When you speak about the “democratization of GPs” at the end of this section, I think it is important to give a shout-out to the astronomy open-source software that led to the widespread usage of the technique in the field. You do this in general in the last-to-second paragraph, but I’m tempted to believe that for astronomy these were even more key than in other fields. Shout-outs to george and celerite for instance I think make a lot of sense in here.
[x] Section 2.1: I think the introduction of the Probabilistic Graphical Model (PGM) in Figure 4 doesn’t add much at this stage of the review, and in my (anecdotal) experience is even a bit distracting. I have to be very honest here with you: the PGM plot in my print copy of the Rasmussen & Williams (2006; RM06) has a couple of “???” attached to it. I think the text (both in RM06 and in your review) is actually crystal clear as it is without introducing the PGM plot.
[x] Section 2.4.1: In equation (13), the introduction of the matrix K might come as a surprise to first-time readers, because you introduce the concept of having a covariance matrix whose kernel is defined by two input vectors with different dimensions (t and t). In previous definitions (eqs 3, 10), a recipe is given to fill the covariance matrix element to element given a single input vector with a single set of dimensions. This brought me flashbacks from the RM06 textbook; I was also taken aback a bit when I saw the same thing, only to realize later that K or K(X,X) was a n x n matrix, whereas K(X, X) was a n x n matrix. Would be good to perhaps briefly mention this. Even better: perhaps give the recipe in the same form as in eqs. (3) and (10) (i.e., element to element) to fill the predictive equations.
[x] Section 2.4.1: If one is computing C in equation (13) --- wouldn’t one always* want the predictive variance? Shouldn’t then K** by default include the white-noise term?
[x] Section 2.4.2: “(…) lead to a number of powerful extensions that are beyond scope of this review” --- was left wondering what exactly was meant here. Can you list a few perhaps?
[x] #35
[x] #36
[x] Section 4.1: 4.1.2 One reason I’ve seen scares people that work in exoplanet atmospheric science is that GPs tend to produce larger errorbars on the transit spectrum. I think this in general makes sense because you are marginalizing over functions, and that adds uncertainty to your estimates (as you folks nicely explain at the beginning). For a simple example on this, see the Appendix on Evans et al. 2018 (https://ui.adsabs.harvard.edu/abs/2018AJ....156..283E/abstract). Perhaps this would be a good time to remind readers of a very clear point that was made by Carter & Winn (2009): ignoring the uncertainty or the overall underlying process (stochastic or not) generating your “systematics” (as we call them in the exoplanet atmospheric literature) will not necessarily produce a biased estimator for your physical parameter (e.g., the transit depth), but it will produce an estimator with a higher variance which is very bad. This is what is actually happening in Figure 4: if you fit several simulations, the average of all the simulations for both the orange and blue distributions will most likely hit the true value. However, the orange distributions will be most of the time inconsistent with the true value. In real life, the second is what you care about --- the former takes up all your telescope time 😊. All this is to say: larger errorbars are, in general, a good sign if you don’t perfectly understand your systematics. This was the case for HST; we’ll see how that goes for JWST.
first comment: added sentence mentioning george and celerite in section 1.1 (since @dfm is probably too modest to do so himself), ticked
second comment: agree with suggestion of removing PGM but just need to run it past @dfm before removing it from manuscript and ticking
third comment: added element by element definition as suggested, ticked
fourth comment: clarified sentence explaining when one might want to include the white noise variance in the prediction and when not, ticked
fifth comment: need to run past Dan before deciding what to do. I see no issue with mentioning nested samplers to get marginals more multimodal cases, but certainly wouldn't want to wade into a more extended discussion of which sampler is better for what
6th comment: same: happy to mention something as a useful guide but don't want to get into a discussion of evidence thresholds
7th comment: added sentence to relevant paragraph of 4.1.2, hoping it won't ruffle feathers of people who choose not to use GPs for HST transmission spectroscopy community too much.... Ticked
From @nespinoza: