Unexpected optimization result after changes in PR #110

FlamTeam / flamedisx

Fast likelihood analysis in more dimensions for xenon TPCs

https://flamedisx.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

15 stars 13 forks source link

Unexpected optimization result after changes in PR #110 #129

Closed plt109 closed 3 years ago

plt109 commented 3 years ago

In PR#110, changes were introduced to solve a bug in Source._set_temporarily, in which the parameters were not reset properly after being called by simulate and annotate.

The changes were tested before merging and there was some impact on the running time (see issue #111), but what was not caught was that the changes also introduced some changes to the optimisation outcome.

Using master (bf24079ce21623482098364e78117ce6a073748c), which has copy in Source._set_temporarily, the bestfit results to some science data produced MC events with a strange rounded edge for large values of cs1. Upon closer inspection, there seems to be a 'soft cut' in cs2 ~10k PE, which can lead to this roundedned in cs1. See figure below: copy_fit

However, using commit fb01fbefe949f4690df87c79676ed2945500dfe4, which is forked from master (bf24079ce21623482098364e78117ce6a073748c) but without copy or deepcopy in Source._set_temporarily. The bestfit values to the same set of science data produced MC events without the strange rounded edge for large values of cs1. See figure below: no_copy_fit

I failed to reproduce this rounded behaviour by fitting to simulated events instead of real science data, so perhaps the whole fitting process proceeded as it is supposed to (no bugs with setting and resetting of parameters) and the fit was poor simply because the SR1ERSource requires further refinement. But perhaps the changes in PR#110 does have some further unknown impact.

Help is greatly appreciated.

sophiafarrell commented 3 years ago

Hey Pueh, this is a really interesting catch. I wanted to ask: is this run over many iterations and you are simulating the median/mean of the fit values? And in both cases are the initial guesses for the fit parameters the same? It could be a statistical error otherwise... I will keep looking into this today though but others who may know (since I'm not as aware of your previous update you mention) could answer this faster.

sophiafarrell commented 3 years ago

Another thing that might be helpful to see would be the g1/other S1-related parameters between the two versions. Is there a specific parameter that's going wonky, or are all of them? I mean basically those plots between the bbf fit green bands and flamedisx fits for the parameters that you showed in your recent ER fit notes. Can help narrow down the problem maybe !

JelleAalbers commented 3 years ago

Hi Pueh, thanks for reporting. It is certainly tricky to diagnose a poor fit to data, since there could be many reasons: optimizer issues, outlier events, model imperfections, or indeed bugs in flamedisx.

For example, flamedisx's default ER energy spectrum is still set to a 0-10 keV flat model: https://github.com/FlamTeam/flamedisx/blob/master/flamedisx/lxe_blocks/energy_spectrum.py#L35. Below cS1 = 70PE XENON1T should definitely see >10 keV events, see e.g. figure 4 in https://arxiv.org/pdf/2006.09721.pdf. Maybe you already fix this in your code (I vaguely recall we encountered this before), but otherwise this would certainly cause a deficit of events at the high-energy end of your space.

The bug #110 fixed was very insidious: it caused incorrect parameters to stick around in flamedisx sources. Before the fix, your code may have be doing something quite unexpected. If you get good fits to simulated data, my first thought would be a data/model mismatch rather than a flamedisx bug. But I wouldn't rule anything out of course.

sophiafarrell commented 3 years ago

Before the fix, your code may have be doing something quite unexpected. If you get good fits to simulated data, my first thought would be a data/model mismatch rather than a flamedisx bug.

Indeed, I concur with this, @JelleAalbers. It seems there is an artifact of a small typo in the fiducial volume cut: in the plots above, there are 3089 SR1 Rn data points, and in fact there are something like ~6000 when the fiducial volume cut is right. (I've checked this myself and know it is the FV cut that has the typo, so that rules out other data/model mismatches on the data cutting level.)

@plt109 can you re-fit this with the correct fiducial volume cut? Given Jelle's comment, I would say we should correct this FV typo first, then see if the artifact you found remains. (Feel free to message me for the specific code change if it got lost earlier on when I suggested it.) I'm also happy to help further if you need clarification.

plt109 commented 3 years ago

@sophiaandaloro Thanks for the suggestions but I've already tried to fit it with the correct fiducial volume cut but the problem persists. I am just using a subset of the events so that things are faster. I have also tried to tweak around the parameters individually(swapping out the values one by one) and found some trends between some pairs of nuisance parameters. However, that is proving to be not the most efficient path to proceed; I need to know why the optimizer is not converging to the 'correct' values (either model mismatch or bug).

@JelleAalbers The ER source I used for the fit is inherited from ERSource, and I expanded the input spectrum from 10 keV to 13 keV because we had a discussion about this and the conclusion was increasing it much further does not yield any benefits since most of the events would be covered by 13 keV. But I will increase the upper bound of the input spectrum and see if that helps.

class mySource(fd.SR1ERSource):
    energies = tf.cast(tf.linspace(0., 13., 1000),
                       dtype=fd.float_type())

    t_start=pd.to_datetime(min(rn_data_cut['event_time']), unit='ns')
    t_stop=pd.to_datetime(max(rn_data_cut['event_time']), unit='ns')

Could you elaborate on how bug #110 causes incorrect parameters to stick around in flamedisx sources? One of my fears is that actually, the optimization process did not unfold as expected and the optimizer did not venture too far away from the default values anyway, giving an artificially good fit.

JelleAalbers commented 3 years ago

Hi Pueh, good that the energy spectrum is already extended. Maybe going to 15 or 18 keV will help a bit if the model is converging on very low g1 and g2 now, but otherwise it will probably indeed not matter.

I agree with Sophia that looking at which parameters have weird values (e.g. comparing with the published fits) is a good strategy; you can also look at which events might causing that (i.e. which have low differential rates under more reasonable parameters). Since the fit seems to go wrong at the high-energy end of your space, maybe including data up to a bit higher in cS1 (or even just looking at the predictions there) will help clarify the issue. Analysis paper 1 (https://arxiv.org/pdf/1906.04717.pdf) showed data until 80 PE, and I vaguely recall bbf doing most fits up to 100 PE; Cristian will know more about his.

Not sure I can improve much on the descriptions in #110 and https://github.com/FlamTeam/flamedisx/issues/111#issuecomment-791824196. If you could post/link your notebook, that might help us understand how removing this essential bugfix mysteriously improves your fit.

the optimization process did not unfold as expected and the optimizer did not venture too far away from the default values anyway, giving an artificially good fit.

Are you talking about the fits to simulated data? Optimizers are sometimes indeed lazy and do not move from the guess, so the truth is not a good guess for stress-testing the model.

If it helps, this is roughly the decision tree for solving flamedisx fit issues I'm using in my head: bad_fit_decision_tree

plt109 commented 3 years ago

Hello Jelle, thanks for the flow chart and suggestions!

Let me show you smt nice first: bf24079c_first1k_3kevents_20keV I extended the input spectrum to 20 keV(cause why not?) but keeping to 1000 points(but I remember vaguely about some caveat about making the spectrum not flat if I were to increase the range but keeping the number of points the same), and I obtained on the above fit with the same set of real science data (wrong fiducial volume cut that only selects for the top half of the TPC, and only the first 1k events to make things fast just for debugging) using the codes from commit bf24079ce21623482098364e78117ce6a073748c (legit master with copy). But it is not entirely clear to me why I would need an extended input spectrum now that the sticky default bug is sorted. I will give this a few more pokes just in case it's yet another post-dinner fluke.

With the original energy spectrum from 0-13keV, g1 was indeed on the low side (bf g1 = 0.11617562997126687, analysis paper II g1 = [0.142±0.005]). But why would a low g1 or g2 be an indication that I'm not giving it enough range for in the input energy?

Anyway, interestingly g1 was not the most ill-behaved parameter; single_electron_width, q1, q3_nq, gamma_er, omega_er were much worse. What I did was that I had 1 set of best fit nuisance parameter values which gave a good fit (no weird roundedness) using fb01fbefe949f4690df87c79676ed2945500dfe4 and 1 set of best fit nuisance parameter values obtained using bf24079ce21623482098364e78117ce6a073748c and I replaced the value of a nuisance parameter from the bad fit with the value from the good fit until I could reproduce the good fit. What I observed was that I need a certain combination to get rid of the roundedness but it became slightly intractable which is why I abandoned that path.

I looked at the differential rates as well and I started by cutting away 3 events with poor diff rate but the fit remained poor. I then cut away more events (about 50 I think) but the fit remained poor, which is why I abandoned that path as well because conceptually, it can get a little circular. The diff rate is evaluated at the default parameter values(which are currently hardcoded to the analysis paper II values) and if suppose the default parameter values are slightly off, then the diff rate will be off as well and it would not be legitimate to cut away the events based on diff rate evaluated at those default parameter values.

JelleAalbers commented 3 years ago

Nice, that looks like a better fit! If the g1 was actually 20% lower, all S1s would be 20% smaller, so we would see events up to ~16 keV in the < 70 PE cS1 range.

The g1 and g2 are tightly constrained by higher-energy calibrations. For g1, the constraint is 0.142 ± 0.002 (see the 'prior' column of table II in analysis paper 2 https://arxiv.org/pdf/1902.11297.pdf). Your earlier result, 0.116, would be a 13 sigma anomaly; "a bit on the low side" indeed!

If you're not including these g1/g2 priors in your likelihood yet, such results can happen. (The Rn data on its own is not very sensitive to g1/g2 because it is a flat spectrum; you want a line for g1/g2.) Apparently the model found some way of distorting all the other parameters to give you a not-completely-horrible fit with this super low g1. Maybe this is some evil local optimum?

I'd suggest including the g1 and g2 'priors' as constraints through flamedisx's log_constraint option. These 'priors' are really direct measurements in XENON1T, just on a very different energy range. The other bbf priors are (if I'm not mistaken) from NEST and external measurements, so hopefully those won't have a big impact on your fit result. (Though you'll probably want to take the W and p_DPE priors onboard as constraints too, if you're not just fixing those parameters; the Rn data is probably insensitive to those as well).

You are right, just removing events because they have low differential rates makes little sense (expect as a temporary experiment to see if those events have a disproportionate impact on the fit). If there was a pattern among them (e.g. all in a corner of the (cs1, cs2) plot, all at high radius, etc.) that might give a clue to what is going on. But if you see no change in the fit when you remove them, they don't seem problematic.

JelleAalbers commented 3 years ago

Looks like this was solved; can we close this Pueh?

plt109 commented 3 years ago

Yups. Thanks!