Graham reviewer report - Githubissues

Melissa Graham, 2018-07-31

Abstract

[x] Regarding the statement "...we ultimately make the reassuring discovery that multiple metrics yield qualitatively consistent results, enabling PLAsTiCC to choose a weighted modification of the log-loss as a metric due to the availability of a meaningful interpretation." First, the term 'reassuring discovery' gives the impression that a hypothesis has been confirmed, which sort of gives the (undesired?) indication that the study had an inherent bias. Second, the first half of the statement says that multiple metrics are good, but the second half concludes that PLAsTiCC will choose one. It seems that "due to the availability of a meaningful interpretation" is the motivation for choosing log-loss, but that's very vague to the reader. It might be good to examine rewriting this statement.

Introduction:

[x] Paragraph 3 introduces PLAsTiCC, and if there is a citation or website for this it should be added. Also if there is a known timeline for PLAsTiCC that would be a useful addition to this paragraph.
[x] Section 2.1.2, Paragraph 1, Line 245: "but the class with maximum probability is almost always still the true class" --> But if it's perfect, shouldn't it always be? Perhaps a statement to describe why.
[x] S2.2,P1: At this point I wondered if the m' and m~ have to be adjacent, which lead me to wonder how the m classes are ordered at this point. Perhaps a couple statements regarding that would help out any future reader wondering the same thing.

S4.1,P4-5: In several cases, the analysis extends itself to draw conclusions that are not immediately obvious from Figure 3. For me this was an overextension, and I didn't quite follow these statements in particular:

[x] -- "...thereby confirming that the cruise control classifier will not in general outscore a classifier that avoids subsuming." --> Could this instead be shown directly by adding a panel to Fig 3?
[x] -- "From this perspective, we may view the x axis as also representing increasing weight on the affected systematic." --> This was confusing because until now, weights were discussed as applying to classes (w_m in Equation 13), not to the different types of systematics. Would it be possbile to just show this as another panel in Fig 3? - [ ] -- "The perfect classifier with a systematic applied to only one class (or a few) is equivalent to the tunnel vision classifier applied to that systematic as a baseline." --> This was confusing because "tunnel vision" was previously described as a systematic, not something applied to a systematic, and I didn't know what "baseline" was referring to in this context.
[x] -- "Thus Figure 3 also shows the effect of decreasing the weight on the a tunnel vision classifier as the x coordinate increases, relative to a baseline of the systematics indicated in each panel." --> I was pretty lost by this sentence.

Any rewording or additional demonstrations of these conclusions in a plot might be helpful to future readers. I'm sorry this is vague, I can't give a specific request for a change in wording here because I didn't quite understand the discussion well enough.

[x] Figure 4: A 6-panel figure that spans the whole page and contains one systematic per panel, like Fig 1, would make the trends with symbol color and size much clearer (I think). Also if the legend for the size had example symbols of that size that would be a help.
[x] S5,P1: The phrase "...without regard for the impacts of misclassification on science results" seemed out of place, the science results are the whole point of the classification challenge?
[x] S5.1,P1: Regarding the statement "We note that the decision of whether to initiate follow-up observations is binary and deterministic, so a probabilistic classification would ultimately be reduced to a deterministic one for this application." --> I suspect that this may not be true in future practice, because of the nuance in, e.g., how, where, and when to do the follow-up complicates the question from a simple binary yes/no. Questions like "I have one hour of time: should I do three high-priority things with long exposures or ten low-priority things with short exposures" are enabled by probabilistic classifications.
[x] S5.1,P4: Regarding the statement "For a rare event like a kilonova, a false negative does not appreciably reduce the amount of remaining information available to collect, but a false positive represents a large quantity of information forgone", are 'negative' and 'positive' maybe switched? In this case the 'false negative' would be the case where I thought it was not a kilonova, did not follow-up, but it was a kilonova and so I lost it forever; the 'false positive' would be the case where I thought it was a kilonova, did follow-up, but it was not a kilonova and thus got useless data. It seems that the former would "appreciably reduce the amount of information" about kilonovae, since they're rare, and that the latter does not represent any information being lost. In this case I want to minimize the 'false negative' rate much more strongly than the 'false positive' rate, because kilonovae are rare and if I have to follow-up 3 objects per year to make sure I get the 1 per year that is a true kilonova, so be it (takes a well-written proposal and an understanding TAC, but still can be 100% necessary).

A list of typos and other minor things.

[x] S1,P4,L75: "2017) , identifying"
[x] S1,P4, and elsewhere: someone once told me it's gramatically necessary to always have a comma after e.g., like that.
[x] S1,P5,L83: multiple instances of italicization for different reasons is confusing: 'probabilistic', ' constrain', and 'conditioned' seem to be italicized for emphasis, while 'posterior probability density', 'classification posterior', and 'deterministic' are italicized because their statement is defining their meaning. Maybe stick with the latter?
[x] S1,P10,L149: "2017) , with"
[x] Fig 1: Caption could probably just say "from top left to bottom right" or "as labeled" instead of "leftmost top... leftmost bottom... " etc.
[x] S2.3,P1,L328: "2002) physical" --> needs comma
[x] S2.3,P1,L330: "outier" --> "outlier"
[x] Fig 2: The labels get cut off by adjacent panels; extra whitespace between bottom of panels and label 'predicted class'.
[x] S3.1,footnotes 4 and 5: Define variables TP, FN, FP.
[x] S3.2.2,P1,L422: "The Brier score Brier (1950)" --> "The Brier score (Brier 1950)"
[x] S3.2.2,P1,L426: Citations to Crown, Mays, and Florios should be within parentheses like the following citations to Richards and Armstrong.
[x] S3.3,P1,L448: "are the a nonrep"
[x] Figure 3: Axes and tick labels should be larger. When printed in B&W the left-axis label of "Brier" is barely readable. Adding a legend to identify light/dark points as Brier/LogLoss would make this plot perfectly readable in B&W though.
[x] S4.1,P5,L544: "on the a tunnel"
[x] S4.2,P8,L591: "requiring athreshold"
[x] S5.1,P4,L688: "negative ratel"
[x] S5.1,L700: In the sentence starting with "While the groundwork", probably meant to have a comma after "Wu et al. (2018)".

aimalz / proclam

Graham reviewer report #52