fedhere / DtUhackOutliers

1 stars 3 forks source link

deciding dataset for transferrability test #5

Open fedhere opened 5 years ago

fedhere commented 5 years ago

we should choose another dataset to test our algorithms on. the way I envisioned it with Rafael is that we would describe and tune our models on Kepler data then test them on another dataset and comment on the transferability.

the dataset should be 1 - large 2 - single filter (or anyhow reasonable to be used in a single filter, this pretty much rules out PLAsTiCC data which is designed for multi-band observations and is exceedingly sparse in a single band)

Gaia? TESS?

F

juramaga commented 5 years ago

So, the approach so far has been to use only 2500 light curves from the Q16. There are over 165,000 light curves in this quarter. Do we believe that 2500 is enough light curves to validate the methods? If so, we can continue as we have been doing (with 2500 only for comparison of the methods), and then use the rest of the dataset as the test.

Alternatively, we can use the entire 165,000 lcvs for comparing the methods (we have the dataset ready for you), and then as Federica suggests, use a different dataset to do the test . We have done already some work with TESS data, so that would be my suggestion.

If we do the second (which is my preference), we would have our comparison paper validated on Kepler and applied on TESS, and then Dennis would then work on a second paper with some of the astrophysical implications (Hertzprung-Russel diagram, etc.).

Does this sound good?

Rafael

d-giles commented 5 years ago

This largely tracks with what Lucianne and I have been doing with our unsupervised approach. I'm working on wrapping up my thesis work applying our methods to and analyzing the full Kepler data this year, but we've been interested in the next steps and have been considering TESS data. I'd love to get involved if you're going the TESS route.

A comment on the sample, if I remember correctly 2500 lightcurves were used because it was a hack day session where data processing was being done on laptops, and I had that sample of lightcurves readily available. I'm inclined to think that the sample is too small to provide any genuine validation for methods that will be applied to datasets like Gaia or TESS beyond a proof-of-concept.

juramaga commented 5 years ago

Thanks Daniel.

I agree with you regarding the size of the dataset. We have been working with a much larger dataset on the side, and I believe after the initial tests with the 2500 light curves we should run the different methods on the full thing.

Glad to hear you're interested in collaborating on the TESS side of things. After all the idea was to work together on this. In fact, we are planning to use the results on your paper as a benchmark to compare different methods (for our paper 1). Dennis (grad student working at the CfA this year) has done some excellent progress that he will include in his thesis, and we will soon have a set of TESS first results to share with the group. Maybe we can start there to decide on the one or two papers that should come up of this TESS-specific effort.

Rafael