Using W(θ) - Githubissues

Problem statement: the implementation of the polarization function in the test MC simulation is not quite physical.

The proper way to do this would be to find the cumulative distribution function, and invert it. This distribution is not hard to get, assuming I didn't seriously mess something up (noting that the result is a little odd because θ is cyclic, so the function doesn't quite have the usual limits, but this seems to produce the right distribution on inversion):

$F(\phi) = \frac{1}{2\pi}\int_0^\phi 1+\frac{v}{c}AP\cos(\theta)~\mathrm{d}\theta$

Where the factor of 2π is a normalization. This works out to

$F(\phi) = \frac{1}{2\pi}\left(\phi + \frac{v}{c}AP\sin(\phi) \right )$

Therein lies our problem. F(ϕ) is not solvable for θ analytically. One would be able to do this numerically, however the problem is really 2D:

$F(\phi,t) = \frac{1}{2\pi}\left(\phi + \frac{v}{c}AP(t)\sin(\phi) \right )$

where to do the MC simulation we randomly generate F and P(t), and it is important that these are not correlated. So you either make a ton of functions for each of the P(t) and spend a lot of CPU time inverting each one, or do some kind of invertible 2D interpolation which I haven't figured out yet. ~~Furthermore, my implementation of the former doesn't seem to produce any relaxation in the final asymmetry, which is not a great sign.~~ Perhaps this is because P doesn't seem to affect the distribution as much as I would've thought:

w_theta

Here A=1/3, as is the case for 8Li, and v/c = 1.

P.S. For future reference, the online equation editor is here.

EDIT: Nevermind, the first method works. It's just really really slow.

Changes made primarily in a2679feeca72359be8629725c04597bfeef7cb40

This does seem to work, assuming my calculation for v is correct (really not sure about this one - please confirm!). I added some drawing as well to help diagnose issues. Things seem to be ok though:

Inputs: A = -1/3, n=1e6, fn=lamba x: np.exp(-x)

The timing seems to be as follows:

n        time (s)
1e4      0.3
1e5      3.4
1e6      30

So about an order of magnitude of time per order of magnitude of counts. Makes sense. Extrapolating, if we wanted to run for 1e8 counts, this would be about 1 CPU hour, and for 1e9 this is 8 CPU hours. With parallelization this could quickly be brought down to more manageable times, since the problem is embarassingly parallel. For example, with 60 cores we could run 1e9 in about 10 mins.

Note that most experiments run for 1e8 to 3e8 counts.

Your calculation of v/c looks good to me - its the same as I came up with! Incidentally, the plot below gives a pretty good idea of the energy range where v/c is no longer close to 1:

Figure_2

Even though the effect for ⁸Li will be negligible most of the time, we should really randomly generate v/c for each probe based on the β-emission energy distribution:

Figure_1

Luckily, a tabulation is already available:

# energy in keV
E = np.array(
    [
    0,     43,    86,    129,   172,   215,   258,   301,   344,   387,   430,
    473,   516,   559,   602,   645,   688,   731,   774,   817,   860,   903,
    946,   989,   1032,  1075,  1118,  1161,  1204,  1247,  1290,  1333,  1376,
    1419,  1462,  1505,  1548,  1591,  1634,  1677,  1720,  1763,  1806,  1849,
    1892,  1935,  1978,  2021,  2064,  2107,  2150,  2193,  2236,  2279,  2322,
    2365,  2408,  2451,  2494,  2537,  2580,  2623,  2666,  2709,  2752,  2795,
    2838,  2881,  2924,  2967,  3010,  3053,  3096,  3139,  3182,  3225,  3268,
    3311,  3354,  3397,  3440,  3483,  3526,  3569,  3612,  3655,  3698,  3741,
    3784,  3827,  3870,  3913,  3956,  3999,  4042,  4085,  4128,  4171,  4214,
    4257,  4300,  4343,  4386,  4429,  4472,  4515,  4558,  4601,  4644,  4687,
    4730,  4773,  4816,  4859,  4902,  4945,  4988,  5031,  5074,  5117,  5160,
    5203,  5246,  5289,  5332,  5375,  5418,  5461,  5504,  5547,  5590,  5633,
    5676,  5719,  5762,  5805,  5848,  5891,  5934,  5977,  6020,  6063,  6106,
    6149,  6192,  6235,  6278,  6321,  6364,  6407,  6450,  6493,  6536,  6579,
    6622,  6665,  6708,  6751,  6794,  6837,  6880,  6923,  6966,  7009,  7052,
    7095,  7138,  7181,  7224,  7267,  7310,  7353,  7396,  7439,  7482,  7525,
    7568,  7611,  7654,  7697,  7740,  7783,  7826,  7869,  7912,  7955,  7998,
    8041,  8084,  8127,  8170,  8213,  8256,  8299,  8342,  8385,  8428,  8471,
    8514,  8557,  8600,  8643,  8686,  8729,  8772,  8815,  8858,  8901,  8944,
    8987,  9030,  9073,  9116,  9159,  9202,  9245,  9288,  9331,  9374,  9417,
    9460,  9503,  9546,  9589,  9632,  9675,  9718,  9761,  9804,  9847,  9890,
    9933,  9976,  10019, 10062, 10105, 10148, 10191, 10234, 10277, 10320, 10363,
    10406, 10449, 10492, 10535, 10578, 10621, 10664, 10707, 10750, 10793, 10836,
    10879, 10922, 10965, 11008, 11051, 11094, 11137, 11180, 11223, 11266, 11309,
    11352, 11395, 11438, 11481, 11524, 11567, 11610, 11653, 11696, 11739, 11782,
    11825, 11868, 11911, 11954, 11997, 12040, 12083, 12126, 12169, 12212, 12255,
    12298, 12341, 12384, 12427, 12470, 12513, 12556, 12599, 12642, 12685, 12728,
    12771, 12814, 12857, 12900, 12943, 12975
    ]
)

dNdE = np.array(
    [
    0.0000009040950, 0.0000017359400, 0.0000025155100, 0.0000032428200,
    0.0000039642800, 0.0000046960500, 0.0000054454000, 0.0000062160000,
    0.0000070097600, 0.0000078276500, 0.0000086701000, 0.0000095371600,
    0.0000104287000, 0.0000113444000, 0.0000122839000, 0.0000132466000,
    0.0000142322000, 0.0000152399000, 0.0000162692000, 0.0000173195000,
    0.0000183902000, 0.0000194805000, 0.0000205900000, 0.0000217178000,
    0.0000228635000, 0.0000240263000, 0.0000252055000, 0.0000264005000,
    0.0000276107000, 0.0000288355000, 0.0000300741000, 0.0000313259000,
    0.0000325904000, 0.0000338668000, 0.0000351546000, 0.0000364531000,
    0.0000377617000, 0.0000390798000, 0.0000404067000, 0.0000417420000,
    0.0000430849000, 0.0000444350000, 0.0000457915000, 0.0000471540000,
    0.0000485218000, 0.0000498944000, 0.0000512713000, 0.0000526518000,
    0.0000540354000, 0.0000554217000, 0.0000568099000, 0.0000581996000,
    0.0000595903000, 0.0000609815000, 0.0000623725000, 0.0000637630000,
    0.0000651525000, 0.0000665403000, 0.0000679260000, 0.0000693092000,
    0.0000706894000, 0.0000720660000, 0.0000734386000, 0.0000748068000,
    0.0000761701000, 0.0000775280000, 0.0000788801000, 0.0000802259000,
    0.0000815651000, 0.0000828972000, 0.0000842217000, 0.0000855383000,
    0.0000868465000, 0.0000881459000, 0.0000894362000, 0.0000907170000,
    0.0000919877000, 0.0000932482000, 0.0000944980000, 0.0000957366000,
    0.0000969639000, 0.0000981794000, 0.0000993827000, 0.0001005740000,
    0.0001017520000, 0.0001029160000, 0.0001040680000, 0.0001052050000,
    0.0001063290000, 0.0001074380000, 0.0001085320000, 0.0001096110000,
    0.0001106750000, 0.0001117230000, 0.0001127550000, 0.0001137710000,
    0.0001147710000, 0.0001157540000, 0.0001167190000, 0.0001176680000,
    0.0001185990000, 0.0001195130000, 0.0001204080000, 0.0001212850000,
    0.0001221440000, 0.0001229840000, 0.0001238050000, 0.0001246080000,
    0.0001253910000, 0.0001261540000, 0.0001268980000, 0.0001276220000,
    0.0001283260000, 0.0001290100000, 0.0001296730000, 0.0001303160000,
    0.0001309380000, 0.0001315400000, 0.0001321200000, 0.0001326800000,
    0.0001332180000, 0.0001337340000, 0.0001342300000, 0.0001347030000,
    0.0001351550000, 0.0001355850000, 0.0001359930000, 0.0001363790000,
    0.0001367430000, 0.0001370850000, 0.0001374040000, 0.0001377020000,
    0.0001379760000, 0.0001382290000, 0.0001384580000, 0.0001386650000,
    0.0001388500000, 0.0001390120000, 0.0001391510000, 0.0001392670000,
    0.0001393610000, 0.0001394320000, 0.0001394800000, 0.0001395050000,
    0.0001395080000, 0.0001394880000, 0.0001394450000, 0.0001393790000,
    0.0001392910000, 0.0001391800000, 0.0001390460000, 0.0001388900000,
    0.0001387110000, 0.0001385100000, 0.0001382860000, 0.0001380390000,
    0.0001377710000, 0.0001374800000, 0.0001371670000, 0.0001368320000,
    0.0001364750000, 0.0001360960000, 0.0001356950000, 0.0001352720000,
    0.0001348280000, 0.0001343620000, 0.0001338750000, 0.0001333670000,
    0.0001328380000, 0.0001322870000, 0.0001317160000, 0.0001311240000,
    0.0001305120000, 0.0001298800000, 0.0001292270000, 0.0001285540000,
    0.0001278610000, 0.0001271490000, 0.0001264170000, 0.0001256660000,
    0.0001248960000, 0.0001241070000, 0.0001232990000, 0.0001224730000,
    0.0001216280000, 0.0001207660000, 0.0001198850000, 0.0001189880000,
    0.0001180730000, 0.0001171410000, 0.0001161920000, 0.0001152260000,
    0.0001142440000, 0.0001132470000, 0.0001122330000, 0.0001112040000,
    0.0001101600000, 0.0001091010000, 0.0001080270000, 0.0001069390000,
    0.0001058370000, 0.0001047220000, 0.0001035930000, 0.0001024510000,
    0.0001012960000, 0.0001001290000, 0.0000989496000, 0.0000977587000,
    0.0000965565000, 0.0000953432000, 0.0000941192000, 0.0000928850000,
    0.0000916409000, 0.0000903872000, 0.0000891243000, 0.0000878527000,
    0.0000865727000, 0.0000852846000, 0.0000839890000, 0.0000826862000,
    0.0000813765000, 0.0000800605000, 0.0000787386000, 0.0000774111000,
    0.0000760785000, 0.0000747413000, 0.0000733998000, 0.0000720546000,
    0.0000707061000, 0.0000693546000, 0.0000680008000, 0.0000666451000,
    0.0000652878000, 0.0000639296000, 0.0000625709000, 0.0000612121000,
    0.0000598538000, 0.0000584965000, 0.0000571406000, 0.0000557867000,
    0.0000544353000, 0.0000530869000, 0.0000517420000, 0.0000504012000,
    0.0000490649000, 0.0000477338000, 0.0000464083000, 0.0000450890000,
    0.0000437765000, 0.0000424713000, 0.0000411740000, 0.0000398851000,
    0.0000386052000, 0.0000373348000, 0.0000360747000, 0.0000348253000,
    0.0000335872000, 0.0000323611000, 0.0000311475000, 0.0000299470000,
    0.0000287603000, 0.0000275880000, 0.0000264306000, 0.0000252888000,
    0.0000241632000, 0.0000230545000, 0.0000219632000, 0.0000208901000,
    0.0000198358000, 0.0000188009000, 0.0000177860000, 0.0000167919000,
    0.0000158192000, 0.0000148686000, 0.0000139407000, 0.0000130361000,
    0.0000121557000, 0.0000113000000, 0.0000104698000, 0.0000096656700,
    0.0000088883900, 0.0000081386400, 0.0000074171000, 0.0000067244900,
    0.0000060615000, 0.0000054288500, 0.0000048272400, 0.0000042573900,
    0.0000037200000, 0.0000032158000, 0.0000027455000, 0.0000023098100,
    0.0000019094500, 0.0000015451400, 0.0000012175800, 0.0000009274760,
    0.0000006755320, 0.0000004624340, 0.0000002888510, 0.0000001554270,
    0.0000000627607, 0.0000000113510, 0.0000000000000
    ]
)

This is taken from the ⁸Li entry at Livechart, which cites the program BetaShape [X. Mougeot, Phys. Rev. C 91, 055504 (2015)] as its source. I have not, however, checked to see if it is properly normalized.

I agree with your remark that either one makes a gigantic "table" of W(θ) or you compute it numerically over-and-over again. Note that the "table" approach will get even more complicated if you consider also v/c or even different As (e.g., for different decay branches, like in ⁹Li). For the time being, I favour the simple/general approach of re-calculating W(θ) each time, even if it is slow.

In order to wrap my head around some of the performance/implementation issues, I have made my own, albeit much cruder, version of data_generator.py using C++. Our algorithms for the attacking the problem are virtually identical, so hopefully having two independent implementations will be useful while debugging things. It would be particularly useful to compare the initial asymmetry one gets for a given initial polarization. The values I'm getting right now using P₀ = 0.7 are lower than I naively expected, but consistent with the maximum observed asymmetry on β-NQR (~15 %) - see below (n = 1e8, A = -1/3, λ = 1 s^-1):

sim

From this exercise, I have the following thoughts on performance:

We should use one of the PCG PRNGs if we aren't already; they are faster than, for example, the widely used Mersenne Twister generators and have better statistical properties.
Using fewer sampling points when calculating the W(θ) distribution significantly speeds up the overall runtime. If we were considering a situation involving segmented detectors, each covering some small solid angle, having a highly refined W(θ) would be really important; however, since we only have two and the distinction between them is fairly coarse, I think it is less critical here. I found using 200 samples for θ and W(θ) to be a reasonable compromise between speed/accuracy. 100 samples is even passable, but noticeably worse on a zoomed-in plot of W(θ) vs. θ.
The parallelization of the Monte Carlo "loop" is surprisingly efficient; on a 4-core CPU I get nearly a factor of ~4 speed-up! I suppose this is because simulating each individual "event" is actually fairly slow. Presumably this level of efficiency was also achievable in Python.
Compiling the code with aggressive optimization flags had a substantial impact on performance (i.e., -O3 gave appreciably faster code than -O2). On a similar note, in early stages of prototyping using ROOT's cling interpreter, the code was excruciatingly slow to run - enough so that I completely switched to the edit/compile/run work-flow!
Running my C++ implementation on a desktop with a 10+ year old 4-core AMD CPU, an n = 1e8 simulation takes about ~195 s, which is tolerable. This number should drop even further if run on a machine from the last decade!

I checked the probability distribution, it is normalized. While fundamentally, I agree with you, we should use the distribution of E, in practice I don't think it's so useful. This is because of how the two distributions interact. We draw from the bottom plot and plug the result into the top plot. The chance of getting something smaller than 1 is super small. Here's the distribution, created by generating 1e8 random numbers and calculating the appropriate E, and plugging into the equation for the velocity:

prob

99.4% of the time the velocity is larger than 0.9c. 98% of the time it's larger than 0.95c. I don't think it's worth the hassle and computation time to calculate such small effects. It's going to be buried in the statistical noise.

The numpy implementation for their random numbers is a Mersenne Twister pseudo-random number generator. So by your linked chart, good, but not as good as the PCG generator. I don't know of a way to change this.

195 s seems more than tolerable compared to the python implementation. By my earlier prediction, even with the 4x speed up of 4 processors, 1e8 would still take 15 mins with the python code (reducing the sampling of θ doesn't help that much). Perhaps we should be using C++ for this part. At the end of the day, all we need is a data file with the times and counts of the F and B detectors.

I've now tested the performance hit of randomly generating v/c in the C++ version. In a n = 10⁸ simulation, it adds an additional ~5-10 s to the ~195 s runtime quoted above - which, in my view, is acceptable. You're quite right though that the pragmatic impact of its inclusion is small; it amounts to a small reduction in the initial asymmetry:

unnamed

It looks like the PRNG can be controlled in NumPy v1.17 and that PCG64 is the new default generator (see https://docs.scipy.org/doc/numpy/reference/random/index.html). The use examples are a bit different what we're doing currently, but closely mirror how PRNGs are used in modern C++ (i.e., C++11 and beyond).

I'm okay with using C++ as the middleman for this part - I'll upload my code once I add some documentation.

EDIT: C++ code introduced in 01ea548aa4998bed9813a1ce9e914012176eefe0

Thanks for the code! Works nicely. Amazing how much faster the C++ code is. I get similar run times as you do for n = 1e8, running on four Intel Core i7-6500U (2.50GHz).

I would agree that 5-10 s additional runtime is worth the inclusion of the beta decay spectrum.

e82e737aca1ad22864ee0493d29c78929a316d8b: I updated your C++ code to take generic relaxation function input, set-able via the control file.

418167d895b76211d3ee035bbc23af6dae404a2e: I also added a python script, callable from the command line, which translates the results from the ROOT histograms into a csv. Thus the data is more easily usable in python via something like pandas.

It seems to me that what you've done produces the correct output, so I will consider the issue closed!

dfujim / bILT

Using W(θ) #3