Open gully opened 5 years ago
Hey,
Thanks for writing this up! I have lots of thoughts. There's still so much low-hanging fruit in the FFIs, I really want me or a student to go back and dig in deeply.
I had an undergrad last summer automate all the hand-holding I had to do previously (which was mostly in aperture selection, trying to pick something big enough to encompass the entire PSF but small enough to not include any other stars) and we have light curves from that for the full 200,000 star sample. They're a little noisier than the artisanal light curves one might create doing aperture selection by hand, but they do decently well and exist for everything. I want to do something with those! Which might be using some population-level information or additional housekeeping data that we didn't have previously to do better detrending, as you mentioned in your final point.
To your specific questions:
I think this is really going to be a challenge. The reason that this is a tractable problem in the first place is because every year the star comes back and lands on the same pixel. We then have a free parameter which is the offset between the four orientations, which in practice matters at the 1-2 percent level. Just having the one visit in the C16 orientation means there will be an offset that should be hard to correct for in that visit. C5 and 18 might be doable. FFIs at the start of campaigns rather than the end will have thermal issues that will be challenging, and it is better if the pointing is consistent between the two, rather than just before and just after a thruster fire, for example. But maybe! From my perspective it's more work for less effort than mining the Kepler field further.
I want to do non-solar-like stars very much, and also look at the effects of e.g. metallicity on photometric variability (see Karoff et al. 2017 for example). Krista Smith and a student are looking at AGN in the FFIs with a different tool. We compared ~10 targets, five which I selected and five which they selected. I was happy with the performance of f3 for bright stars from that comparison; they might be doing better on very faint objects, which is perhaps not surprising given that's where their science is and ours isn't. But I definitely want to go back at hit up the rest of the stars. I'm imagining something like a random forest classifier that gives the relative importance of various stellar parameters on observed variability.
You've hit on another of my dreams. I would like to be able to do this. The main issue is that our errors are largely dominated by systematics in our uncertainties of the detectors, rather than in photon noise (which we know at the 10ppm level, since Kepler works!) So writing down a likelihood function is challenging. I think you'd also have to go back to the pixel-level data and try to build your own light curve, since the smaller apertures for PDC will have smaller contamination so your astrophysical signals (eg from an EB) will look different in the FFIs than in the long cadence data. I've not quite been convinced by any of the science cases that a probabilistic light curve at the 30-minute level that treats all that and incorporates the FFIs leads to an interesting science result that we don't have now---but I'm open to being convinced that one exists and it's worth pursuing!
I think it depends on the ultimate userbase. It might be 2 people plus their students, in which case a lot of the ease of use things become less important. But I think gradient based methods/GPU computing could be a useful thing to get the best photometry out of the FFIs---if we can model all the stars in the detector simultaneously, that might give us a big gain! Similarly housekeeping data, especially on pointing, might give us the opportunity to have a big gain.
There's another problem I'm interested in. Luan Ghezzi in Brazil has a Master's student working on this a bit, but he's also applying for phd programs and might leave. There are 200,000 stars observed by Kepler but 4.5 MM in the FOV. Many of the other 4.3 million we will be sensitive to them as long-term variables, eclipsing binaries, or in some cases hot Jupiters. Something in a 3 day orbit should randomly catch a couple points in transit, which could be followed up with TESS data in other special cases (or eventually with PLATO). I think there are ways to efficiently find the stars with outliers and then categorize them. It'll get complicated with SPSDs, but I think they're separable from astrophysics at the pixel level.
Also, I'm sure you're trying to limit your travel right now but if you wanted at some point to come down for extended time and hack on some of this we have the resources to make that happen!
Great food for thought here, thanks!!
One admittedly niche science case stems from the desire to register ancillary (e.g. ground-based spectroscopic) observations to space-based photometry, as in the case of RV jitter, or starspot spectral decomposition. Having a probabilistic lightcurve would inform both the stellar rotation phase, and total spot coverage, assuming the ancillary data were spread over timescales comparable to stellar activity cycles. Again, niche. I know of a handful of stars where this registration is interesting.
I computed coarse zero-points for the K2 FFIs for C5/16/18, for which we have some of these ancillary observations in hand. I used sep rather than mahotas, did not tune aperture choice, and ran for entire channels rather than localized postcard regions. Here is a first pass of the zero points of C16 and C18 relative to C5 for the channel possessing M67:
Indeed, your 1-2% level channel differences are conspicuous in C16, as well as the ~1% / year throughput loss (your Figure 2) by the time we get from C5 to C18.
As a double check, I ran the same custom machinery on Kepler prime, and applied it to KIC 4863614, which appears in your Figure 10, third panel from the top. Its 10% amplitude should be conspicuous, even with my crude zero-pointing strategy. I get about the same trend as what f3 measures (yay). I arbitrarily assigned 1% errors, showing how much better f3 does in precision.
Re: Joint modeling the long- and short- timescales in Kepler prime---I think I can write down that likelihood function! The key is putting in smooth functions---like low order Chebyshev polynomials---as the mean model, and then including GP kernels with comparable power on long timescales: nearly degenerate, but the mean model must also pass through the rigid FFI anchors. The GP or mean-model also contains the physics of interest (e.g. high frequency sine waves). Indeed, sketching this on a board in Sydney would be easier than writing out here, but I think it can be done. Folks here at Kepler GO have emphasized the prospect for making (what I'm calling) "accuracy lightcurves" from the long-cadence data. Effectively, repeat the coarse zero-pointing for each TPF slice and then get a really imprecise-albeit-accurate lightcurve, which can be used to penalize the likelihood if the Chebyshevs get too big. I don't think this step is required, but it's possible so why not use it. You're right about asking what's the science case for this tech. I guess it comes back to registering ground-based observations to the photometry. At a given epoch you get a rotational-phase corrected estimate for the Kepler-band flux. This value and uncertainty are only interesting in narrow contexts such as large-amplitude variables with large long-term trends, which indeed was not Kepler's bread and butter.
Re: the 4.3 million other sources. I guess it comes down to how much better f3
does than ground-based? PanSTARRS, ASAS-SN, and other programs have (or will have) many more data points, but (I think) have higher point-to-point uncertainty than what f3 can achieve? The discussion of classifying variables with sparse photometry reminds me of the PLASTICC competition ran on Kaggle. The winning result has a slick paper.
This is all really cool!
The point about being able to register spectroscopic information to the longer-term variability observable over Kepler timescales is well-taken. Even if we could only do an exceptional job for a subsample of stars, perhaps the most well-isolated and bright ones, that could be useful for understanding the evolution of what spots can and cannot do over stellar ages. I still have concerns about dilution and turning the 30-minute data and FFI data into the same scale. Maybe a few eclipsing binaries could be used to calibrate to make sure the depths are right.
I'm always happy to see repeatability! That light curve you made looks really nice, and it's neat to see that you find similar behavior in the K2 data as we saw in Kepler, which gives me confidence the explanations we have for everything we see are about right (never bet against Doug Caldwell!) These stars that vary by 10% are incredible, aren't they? Some must be young and active. Certainly many contact binaries also show similar behavior, which I've seen in the FFIs and the Catalina Sky Survey people folks have written several papers about in their data. I would love to do a better job quantifying how much variability we see is intrinsic to the star and how much is changes in the line-of-sight extinction due to ours and those stars' proper motions against an anisotropic galactic dust distribution, a la eg Smith et al. 2012 (arXiv:1210.8136) who note a 15 AU change in position over six years is enough to produce quite a change in dust extinction, and that's reasonable for at least some of the Kepler targets.
Did you happen to do the same for 8462852 by chance? :)
I think your explanation of a likelihood function makes sense, but I need to sit down and work through the math. I'm happy to team up to think about this more!
I think we do a lot better in the FFIs for the 4.3 million targets than ground-based surveys just because they are so faint, point-to-point scatter is so high. These targets are almost all fainter than Kp=16 (Kp=14 if they're giants) and the ground-based surveys just don't have 1% photometry at that level, while we have a 2-5 mmag for many of them. I'm inspired by a really nice paper by Laszlo Molnar last year that used the FFIs to show just from those data you can measure RR Lyrae periods correctly to the 5th decimal place and did some validation of the Gaia DR2 catalog (arXiv:1805.11395). If those big amplitudes do well with these surveys relative to the ground, then surely the few percent EBs or the hot Jupiters are more detectable! I have less of a compelling science case that hinges on finding these, but I really like this search as a student project that will teach all the skills someone needs to do more photometric work.
Agreed to the themes here, and the point is taken about dilution--subtle aperture affects will matter, and EBs and other benchmarks would be great calibrators to spot-check.
That's Tabby's star isn't it? I didn't try-- I expect the constraints from my coarse calibration to be... well coarser than yours, so it's a no contest at the moment.
The differential extinction from proper motion is both worrisome (how to disambiguate long term stellar signals faithfully) and neato (AU-scale tomography of cluster environments?). I suppose Gaia proper motions for nearby objects should be good enough to inform to what extent long term trends arise from intrinsic and extrinsic flux changes.
Yeah, apertures will be the biggest thing to get right! But we have EBs. And we can do pixel-level analyses in theory to try to understand how important dilution is, both in the 30-minute data and in the FFIs.
Yeah, Tabby's star! It was the first thing we did with the FFIs so I always like to see others reproduce it, make sure we're not standing on a table of misunderstood systematics (or at least that we're all standing on the same table). I asked Krista and her student to run that one in our comparisons too (they got similar results, which was comforting!)
Yeah, you've hit on my thoughts too---if we have variability that is correlated with the amplitude of a star's proper motion and/or galactic position then we should be working out ways to account for it. It would be a super neat result! I think there are somewhat more long-term trends in the FFIs than I would naively expect, which could be extinction. Although I talked to Greg Feiden (and I think you?) at one point, and he thought close binarity might be sufficent to explain it. Now that we have DR2 and lots of photometry from all the FFIs I should go back and see if the stars with long-term trends are also brighter in the way you'd expect if they were binaries!
Demo of Tabby's star with custom and crude FFI zero point calibration:
You can see the conspicuous channel-level offsets imperfectly corrected in at least one channel group. The overall dip trend at BKJD = 1200 remains, albeit spuriously amplified by the channel-group offset in the custom reduction.
Yeah, sweet! Thanks for running that. I think I see the long-term signal too, at least looking at just the groupings of points on the same channel seems to have a small year-to-year change.
We haven't talked about the Golden FFIs yet, but there's something to do with those as well for calibration purposes, I think. The telescope is so stable over those images, they are telling us something about the astrophysical scatter over day-long timescales. Of course the regular 30-minute data tells us that too. But I think there's something more to learn from those.
I recently had another look at f3 and wondered about conceivable extensions and applications.
Is it possible to calibrate the C5, C16, C18 FFIs in the same way as the Kepler prime mission?
As it stands, f3 is not equipped for this task, since it is A) hardcoded to Kepler prime, and B) many of the "ubercalibration"-style likelihood assumptions require a large enough sample of FFIs to break significant degeneracies.
Applications to non-solar-like stars, and non-stars in general
Kepler Prime observed Active Galactic Nuclei (AGN, Smith et al. 2018), and it'd be meaningful to constrain the long-term behavior of this AGN sample.
Joint modeling the long- and short- timescales in Kepler prime?
Imagine a likelihood function consisting of a flexible Gaussian Process with many tuning parameters that fits all Q1-17 long cadence data and simultaneously fits the F3-derived photometry. You could then warp the mean model of the long cadence similar in spirit to forward modeling spectral continuum normalization (e.g. Czekala et al 2015, Figure 3).
Is it worth investing in modernization?
Could porting some parts of the code to lightkurve alleviate the brittle dependency of kplr? Could recent innovations in gradient-based methods help with the large number of parameters for joint modeling projects described above? Could some tutorials for more objects and application be helpful to expand the usefulness of f3? Could recently released and reformatted thermal telemetry data help as regressors for long term systematics? These modernization steps could help, but are they worth the investment?