Fitter may be non-deterministic and giving very different fit values run-to-run for some targets

JeffLCoughlin commented 8 years ago

I ran the same 5 objects via Susan's backend, which calls the fitter given an input period, ephemeris, etc. Below are the values coming out for the same 5 objects for 3 different runs. The first five columns are EPIC ID, Period, Epoch, Duration, and SNR.

Note that for the first object, 210775710, the epoch changes by up to 0.04 days, the duration ranges from 2.97 to 4.92 hours, and the SNR changes from 148 to 202. This is a systematic feature, but it is deep, so I wouldn't normally expect such a big variance fit-to-fit. Seems worrisome.

Some others however, like the second, third, and fourth objects, don't vary at all run-to-run - they are exactly the same. These do look like real transits though. Any idea what's going on here Chris?

==> RUN1 <== 210775710 59.844 2291.513247 4.916 200.72 nan nan nan nan nan 0 1 0 0 TLpp (0.0) above threshold 0.0 210956385 56.624 2289.041527 4.541 67.17 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 211442297 20.273 2344.431214 2.966 59.92 nan nan nan nan nan 0 1 0 0 SIG_SEC_IN_MODEL_SHIFT 211418729 11.390 2341.498451 3.173 110.84 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 201702477.0 40.736200 0 0 0 0 0 0 0 0 -1 -1 -1 -1 NO_Analysis

==> RUN2 <== 210775710 59.844 2291.553793 2.970 148.77 nan nan nan nan nan 0 1 0 0 TLpp (0.0) above threshold 0.0 210956385 56.624 2289.041525 4.541 67.17 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 211442297 20.273 2344.431214 2.966 59.92 nan nan nan nan nan 0 1 0 0 SIG_SEC_IN_MODEL_SHIFT 211418729 11.390 2341.498451 3.173 110.84 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 201702477.0 40.736200 0 0 0 0 0 0 0 0 -1 -1 -1 -1 NO_Analysis

==> RUN3 <== 210775710 59.844 2291.515631 4.802 202.35 nan nan nan nan nan 0 1 0 0 TLpp (0.0) above threshold 0.0 210956385 56.624 2289.041527 4.541 67.17 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 211442297 20.273 2344.431215 2.966 59.92 nan nan nan nan nan 0 1 0 0 SIG_SEC_IN_MODEL_SHIFT 211418729 11.390 2341.498451 3.173 110.84 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 201702477.0 40.736200 0 0 0 0 0 0 0 0 -1 -1 -1 -1 NO_Analysis

kcolon commented 8 years ago

Looks like the first two objects are quite long period. Was there more than one transit seen in the data for those two, or only one? I wonder if that’s part of the problem.

On Jul 7, 2016, at 5:09 PM, JeffLCoughlin notifications@github.com wrote:

I ran the same 5 objects via Susan's backend, which calls the fitter given an input period, ephemeris, etc. Below are the values coming out for the same 5 objects for 3 different runs. The first five columns are EPIC ID, Period, Epoch, Duration, and SNR.

Note that for the first object, 210775710, the epoch changes by up to 0.04 days, the duration ranges from 2.97 to 4.92 hours, and the SNR changes from 148 to 202. This is a systematic feature, but it is deep, so I wouldn't normally expect such a big variance fit-to-fit. Seems worrisome.

Some others however, like the second, third, and fourth objects, don't vary at all run-to-run - they are exactly the same. These do look like real transits though. Any idea what's going on here Chris?

==> RUN1 <== 210775710 59.844 2291.513247 4.916 200.72 nan nan nan nan nan 0 1 0 0 TLpp (0.0) above threshold 0.0 210956385 56.624 2289.041527 4.541 67.17 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 211442297 20.273 2344.431214 2.966 59.92 nan nan nan nan nan 0 1 0 0 SIG_SEC_IN_MODEL_SHIFT 211418729 11.390 2341.498451 3.173 110.84 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 201702477.0 40.736200 0 0 0 0 0 0 0 0 -1 -1 -1 -1 NO_Analysis

==> RUN2 <== 210775710 59.844 2291.553793 2.970 148.77 nan nan nan nan nan 0 1 0 0 TLpp (0.0) above threshold 0.0 210956385 56.624 2289.041525 4.541 67.17 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 211442297 20.273 2344.431214 2.966 59.92 nan nan nan nan nan 0 1 0 0 SIG_SEC_IN_MODEL_SHIFT 211418729 11.390 2341.498451 3.173 110.84 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 201702477.0 40.736200 0 0 0 0 0 0 0 0 -1 -1 -1 -1 NO_Analysis

==> RUN3 <== 210775710 59.844 2291.515631 4.802 202.35 nan nan nan nan nan 0 1 0 0 TLpp (0.0) above threshold 0.0 210956385 56.624 2289.041527 4.541 67.17 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 211442297 20.273 2344.431215 2.966 59.92 nan nan nan nan nan 0 1 0 0 SIG_SEC_IN_MODEL_SHIFT 211418729 11.390 2341.498451 3.173 110.84 nan nan nan nan nan 0 1 0 0 ODD_EVEN_DIFF 201702477.0 40.736200 0 0 0 0 0 0 0 0 -1 -1 -1 -1 NO_Analysis

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/barentsen/dave/issues/15, or mute the thread https://github.com/notifications/unsubscribe/AM5vypQWRifVT8-lYIrqclyrWvcLwriVks5qTZVHgaJpZM4JHndn.

christopherburke commented 8 years ago

The trapezoid fitter is not guaranteed to be deterministic and find the global minimum. The trapezoid fitter does fitTrialN (currently defaults to 13) trials of fitting the transit with random starting conditions around the initial results. The best solution out of these 13 trials is taken as the final result. I think misshapen and high SNR transits will typically take more trials to find a global minimum. The only way around this is to (1) bump up fitTrianlN (18? takes more runtime), (2) live with the non-determinicity, or (3) somehow set the random seeds (fixed for all targets or based upon some other unique target identifier) in the trapezoid fitting.

So, I am not surprised to see nondeterminicity. Is this breaking something? My preference is (2) do nothing, (1) bump up trials, (3) is quite a bit of work to test to make sure it works as expected and will not improve your chances of finding global minimum while giving the false impression through determinicity that it has.

JeffLCoughlin commented 8 years ago

Ah that makes sense then. It isn't breaking anything - I just noticed it when comparing runs from my local machine and on Hal and investigated it. The large variance in the SNR and duration was worrisome, but given that target was a large systematic, and the transit-like ones don't appear to have the same effects, I think we're fine. We should just keep an eye out in case we ever see important dispositions change from run-to-run as a result.

barentsen / dave

Fitter may be non-deterministic and giving very different fit values run-to-run for some targets #15