Simulated Data Tests - Githubissues

dfujim commented 4 years ago

It seems in the past 5 years that most SLR runs on the NMR side are on the order of 2x10⁸ counts. Here are the averages and standard deviations in units of 10⁸ counts and for the NMR side only:

Year	Mean	Standard Deviation	N
2015	1.3	1.0	1003
2016	2.9	2.4	1206
2017	2.3	2.0	821
2018	4.6	23.5	496
2019	1.7	1.3	690

(for generating script see here)

However there's a decent amount of skew to the distributions.

Noting that I cut off the histograms at 10⁹ counts for readability (2018 was a strange year for high stats).

I then propose the following tests to show the effectiveness of the transform, with appropriate α:

Using a single exponential, we vary the number of counts in the range of 10⁸-10⁹
With 6x10⁸ counts, which is about the upper bound of statistics that we see in a high-stats run, we test the separation of the SLR rates in a biexponential. Say fix T₁^(a) to 1s, and vary T₁^(b) from 0.1s to 100s on a log scale.
Show what a stretched exponential looks like, for β of 0.3, 0.5, and 0.7.
Show the effect of a constant offset.

Is there anything else that we need to show? I had originally thought of a triexponential, but this seems superfluous when you have the biexponential. We could repeat the biexponential test with 2x10⁸ counts, which would represent a more standard runtime (~ 30 mins). We are detecting about 10⁵ counts/s.

EDIT: Originally I suggested we run the biexp at 6e9 counts, I meant 6e8.

rmlmcfadden commented 4 years ago

Those sound good to me!

One thought is that we could widen the range of n for the single exponential test, say to 10⁶-10⁹. The lower end might be excessive, but I think this will help show (and possibly exagerate) the convergence of the ILT result to the true value (e.g., in a plot of p(T₁) vs. T₁). Plus, n < 10⁷ is trivial to simulate!

I think whether the lower stat biexponential test is necessary will probably be clearer once we have the results from the other exponential/biexponential tests.

dfujim commented 4 years ago

Ok I have started running data generation on muesli. Gotta get that extra 4 processors. I installed tmux to do this more easily, if you wanted to check in or run extra jobs. If you haven't used it before, tmux allows you to disconnect from the machine and have your jobs keep running. Here's a nice cheatsheet.

With 8 processors this should go fairly quickly.

EDIT: I decided that more data is better than less data. It should take about 5 hours. I also added data collection on the amplitude, just out of curiosity.

dfujim commented 4 years ago

So single exponentials are handled pretty well it seems. The data set varying the stats finished, and even with α=0 we get very good results. This, despite the fact that the L-curve predicts we should use α=4x10³ or something.

Note, that for the below plots I changed T₁ to be 2.5 so make sure there wasn't anything special about the value 1. It seems like there isn't... despite the surprisingly good results. The set on muesli is generated with T₁=1, but it's boring because the ILT solver finds a delta function near this value every time. The main result is a small shift. Even with very low stats, if you run the sim more than once, you will often get decent results (red line denotes actual T₁).

Varying α basically just changes with width, with only a small shift in the mean of the distribution:

dfujim commented 4 years ago

alpha0_mean_T1

Figure generated with this script

dfujim commented 4 years ago

So I wrote a new method for peak finding: see the discriminator. With this we can determine the height, width, and peak location as well as how many peaks exist; so long as they are well separated. We now see that the α = 10³ version finds fewer peaks than α = 0, at the cost of a much larger width. None of the plotted quantities seem to depend greatly on n. Perhaps the biexponential analysis will show something more interesting.

alpha_discr alpha_discr_npeaks

Version used to generated plot here

dfujim commented 4 years ago

So it turns out I messed up the biexp calculations. Actually most of the file have the same issue: T₁ got included in the equations as 1/T₁ but remains labelled as T₁. In the other cases this is not a problem since T₁=1. It matters in the biexp case though since T₁ is varying. I fixed this in f368bc357fc0f4a6846d2f1bd45142d6e6fb8c3f and e51a8c38562946fb0528a7bdb8e65ad93b3e8656 and re-generated the data for the biexponential data, but not the others.

dfujim commented 4 years ago

Fitting the amplitude variation with single exp fits shows a very clean relationship:

pol2amp

Settings (where the amplitude varies):

A_beta: -0.3333333333333333
T1 (s): 1
amp: 0.1
beam pulse (s): 4.0
histogram n_bins: 1600
histogram t_max: 16.0
histogram t_min: 0.0
lifetime (s): 1.2096
n: 600000000.0
output: amp0p1.root
polarization function f(x): 0.1 * exp(-x)

EDIT: so if we see initial asymmetry of 0.15, this is full amplitude for polarization of 0.7.

rmlmcfadden commented 4 years ago

Some thoughts:

Nice work!
The initial asymmetry vs. initial polarization plot is very useful to know! It is satisfying to see that it looks consistent with what is known from other simulations/experiments.
I'm a little wary of the choice of α vs. n - I would have thought that α_opt decreases monotonically as n increases. This looks to be true in my tests, but its way less sensitive that I would have thought. In any case, I think we need to implement a more robust way of determining α_opt.

dfujim commented 4 years ago

I agree, the determination of α_opt is not very robust. My conclusion at the moment is that the value of α doesn't seem to matter too much in the case of the single exponential. It just makes everything broader, but erroneous peaks still exist even with larger α. My guess is that it won't matter much for biexp either. Perhaps it will for the strexp since it is supposed to be continuous.

dfujim / bILT

Simulated Data Tests #4