Tracks get stuck on boundaries in field propagation

hahnjo commented 3 years ago

In TestEm3 with magnetic field and 10 GeV electrons, we see too short charged track lengths and lower energy deposit in most layers. Most importantly, the problem is worst for the middle layers with the most particles (so highest statistics) and highest energy deposit, that should agree the most with Geant4.

See https://github.com/apt-sim/AdePT/issues/145#issuecomment-1034784802 for a description of the underlying problem.

hahnjo commented 3 years ago

After discussions on the recent PRs, I'm not sure anymore that my analysis of the helix stepper using the straight line distances is actually correct (my understanding now is that this only happens for the last chord step that hits a boundary). So I've rephrased the summary to describe the actual problem in verification that needs to be fixed. I've tried @agheata's fix for the last step (from #148), but did not see any measurable change.

agheata commented 3 years ago

Debugging further the constant field propagator, I noticed that the field vector is not transformed to local coordinates. I'm not sure this explains all the problems we're seeing, but makes all calculations wrong in rotated volumes. The fix looks non-trivial though because it has to be dealt with by fieldPropagatorConstBany in a general way.

agheata commented 3 years ago

Debugging further the constant field propagator, I noticed that the field vector is not transformed to local coordinates. I'm not sure this explains all the problems we're seeing, but makes all calculations wrong in rotated volumes. The fix looks non-trivial though because it has to be dealt with by fieldPropagatorConstBany in a general way.

I may be wrong here, because the field calculations are done with global coordinate/direction as pointed by @hahnjo but I'll double-check

hahnjo commented 2 years ago

To get the ball rolling again, I've started to look into this and what is actually causing the discrepancies in an even easier setup. This turns out to be low-energy electrons (10MeV) shot at a relatively thin, but dense layer (for example 1mm of PbWO4) in magnetic field (Bz=1T). For this, TestEm3 from G4HepEm yields:

      0        1.341704e+00        1.338786e+00

while AdePT outputs:

    0        1.338217e+00        1.335303e+00

As the setup is so simple, we can run many millions of particles to get very good statistics with very stable mean values and observe differences up to 0.3%.

The underlying reason is not physics (as also evidenced by the very good agreement without magnetic field), but only the fact that MSC changes the direction after every single step and this badly interacting with field propagation. It's possible to observe the very same problem when disabling all physics interactions and applying an isotropic direction after each step as defined by the MSC step limit. In that case, the difference of the "energy deposit" goes up to 3% (even though it has no physical meaning)! (For comparison, the values agree to 0.07% when disabling the magnetic field for this fake "isotropic direction" setup.)

hahnjo commented 2 years ago

I've managed to produce the discrepancy without any involvement from the physics processes in G4HepEm: In every step,

propose a constant "physics" step limit of 0.5mm;
let geometry and possibly field propagation determine how far the track actually goes;
lose up to 10% of the kinetic energy; if the track encountered a boundary, this ratio is scaled according to track.GetStepLength() / ProposedStepLength (for the Geant4 case);
set the direction to an isotropic vector in the unit sphere (code stolen from annihilation at rest).

Without magnetic field, the values of the track length and the deposited energy agree to well below half per-mill. With Bz = 1T, the track length is wrong by 1.5% and the deposited energy by 0.5% (the lower error on the latter makes sense since the energy loss follows an exponential distribution, so errors in later steps have a smaller magnitude).

agheata commented 2 years ago

2. let geometry and possibly field propagation determine how far the track actually goes;

I'm not getting the possibly field propagation part. In the physics case, magnetic field just limits the length of the internal virtual steps made to reach either the proposed physics step or a boundary, it should not impose a step limitation by itself, right? The energy is deposited at the end of the step based on the total travelled distance (corrected by MSC) and this should be independent on the way the final point is reached..

I think that randomizing isotropically the direction increases the probability of getting boundary crossings near-parallel to the surface, which enhances the differences due to the way boundaries get crossed in the 2 cases (push versus more advanced iterative search). So maybe we have to apply the same 1 to 4 points after patching at least our boundary crossing to be more close to the one in G4?

hahnjo commented 2 years ago

I'm not getting the possibly field propagation part.

What I wanted to say is that there's obviously no field propagation in the case without magnetic field, which I always use to verify that I actually implemented the same "physics" on both sides.

In the physics case, magnetic field just limits the length of the internal virtual steps made to reach either the proposed physics step or a boundary, it should not impose a step limitation by itself, right?

Well, I consider "hitting a boundary" a step limitation as well because it shortens the distance traveled, which influences the energy deposit.

after patching at least our boundary crossing to be more close to the one in G4

I did this in October, and as discussed numerous times around every week in the past months, this had zero effect. Unless you have some new ideas here, it doesn't help to bring back the same claims over and over again.

agheata commented 2 years ago

Is it too difficult to put in this patch that may have had no impact in the previous testing conditions, but which is clearly a (major) divergence point to eliminate as we expose the bug/feature to different testing? I don't claim any new ideas here, just trying to figure out why you get these results and give my input. And in addition I understood that you already have this code, which only improves the precision of the crossing point. But if you consider this opinion as noise you can simply ignore it.

hahnjo commented 2 years ago

No, it's not difficult and it was one of the first things I already checked yesterday. I just tried it again with the setup described in https://github.com/apt-sim/AdePT/issues/145#issuecomment-1028792921 and it still shows the inverse effect at a smaller scale, ie the track length and deposited energy decrease even more. This is well understood, as also discussed in the past. Once we solve this issue and are not satisfied with the precision (ie, we get too high energy deposits instead of it being too low right now), we can re-evaluate if we need some of the iterative corrections that Geant4 makes.

I'm currently working on simplifying the setup even more. It might be possible to completely eliminate energy loss from the equation, but then we might need to go to even lower energies (1MeV, which we currently reach with enough steps) and higher statistics. I'm still testing before I post a definitive update.

hahnjo commented 2 years ago

A couple of oddities that I found so far:

PR https://github.com/apt-sim/AdePT/pull/95 introduced and used momentumXYMag as follows: https://github.com/apt-sim/AdePT/blob/fba5bd2373a67ffcf2478aec1d93a0d9e07171ba/magneticfield/inc/fieldPropagatorConstBz.h#L78-L91 If there is no z-direction, ie direction[2] = 0, then momentumXYMag = momentumMag. Otherwise, and especially as direction[2] approaches 1 or -1, momentumXYMag will get smaller and smaller. This leads to an increasing value for curv and a decreasing value of safeLength. So in essence, the more a track is flying in z-direction and the less it is influenced by the constant field in z-direction, the shorter safeLength will get and the more chord steps magnetic field propagation will do. That is ... counter-intuitive, but maybe I'm missing something in here? I don't think that's a problem for correctness, only performance.
During my experiments, I found that AdePT's field propagation was sometimes much more precise and found "touching" intersections that Geant4's propagation didn't. It is possible that this happened when the track was flying strongly in z-direction and this is actually related to the first point, I'll have to check the logs. Anyway, again no problem for correctness I think.
The really important and worrying issue is that particles can get stuck on boundaries during field propagation, if they curl such that they re-enter the volume that they just exited within one chord step. What will happen in such cases is that endPosition is "behind" the particle's position, so propagation along chordVec will return a distance of 0 and the track won't move. Because of the way we make progress in navigation, the track will temporarily enter the volume, only for relocation to find out that this wasn't correct because the direction is actually pointing to the other volume. That way, the track ends up in the same state as in the beginning of the step and will never make progress. It will eventually be killed by the looping detection system (once only electrons are left over that don't produce any secondary particles for 200 steps) and its energy as well as any charged step length that it would have traveled are lost, leading to the difference compared to Geant4 described originally. We need to handle this case in AdePT, probably sub-dividing the step; I'll need to check what Geant4 does in this case.

agheata commented 2 years ago

Hi Jonas, about your first point. Your observation is correct, the current approximation only takes into account the bending in the XY plane, ignoring the displacement in the Z direction. I have already fixed for this in my local branch, not yet in the master because it was not overall ready. The more correct way to approximate is:

  double dirXY = sqrt((1. - direction[2]) * (1. + direction[2]));
  double invdirXY = 1. / (dirXY + 1.e-30);
  // Curvature in the XY plane
  double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * invdirXY / momentumMag;
  double safeLengthD = invdirXY * sqrt(2 * gEpsilonDeflect / curv);

For a track having almost zero XY component, safeLengthD gets correctly a very large value (since it goes like sqrt(invdirXY)). I realize however that the calculation done as above may have numerical problems when dirZ goes to +/-1, safeLengthD formula should probably expand inside the curv expression and do the reduction of invXY in the square root.

As you say, this is affecting performance, so it should be done right.

agheata commented 2 years ago

I think your third point is a not-so-unlikely scenario leading to tracks being killed, and I agree that sub-dividing the final step could help.

hahnjo commented 2 years ago

The more correct way to approximate is:

With a bit of basic variable replacement, the code is equivalent to:

double dirXY = sqrt((1. - direction[2]) * (1. + direction[2]));
// Curvature in the XY plane
double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * (dirXY + 1.e-30) / momentumMag;
double safeLengthD = sqrt(2 * gEpsilonDeflect / curv);

Put differently, you're proposing to replace division by dirXY with a multiplication? That appears to not explode in the edge case of dirXY = 0, but is rather hand-wavy overall...

agheata commented 2 years ago

It doesn't look you did the replacement right, it should read:

double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * invdirXY / momentumMag; // note the *inv*

and dirXY does not reduce completely when replacing it in:

double safeLengthD = invdirXY * sqrt(2 * gEpsilonDeflect / curv); // invdirXY corrects for the real arc length
// and after replacement
double safeLengthD = sqrt(2 * gEpsilonDeflect * momentumMag * invdirXY / std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue));

So safeLengthD goes eventually like 1/sqrt(dirXY), i.e. tracks along Z (dirXY=0) have large deflection safety. This correction is not at all hand-wavy, it just geometrically corrects for the real length along the helix trajectory rather than using only the XY projection. Of course there is an approximation for safeLengthD because the formula stands only for small deflection angles, but the above correction does not introduce any extra approximation.

hahnjo commented 2 years ago

My code above is correct, please read it again: I moved the invdirXY from the calculation of safeLengthD into the sqrt (making it invdirXY ** 2) and then into curv, where it cancels with one invdirXY. And replacing a multiplication with a division smells hand-wavy to me.

agheata commented 2 years ago

The formula you quote:

// Curvature in the XY plane
double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * (dirXY + 1.e-30) / momentumMag;

is wrong, because in master it reads like this which divides by dirXY.

hahnjo commented 2 years ago

Now you're saying your own code from https://github.com/apt-sim/AdePT/issues/145#issuecomment-1032445431 is wrong.

master has

double momentumXYMag = momentumMag * sqrt((1. - direction[2]) * (1. + direction[2]));
double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) / (momentumXYMag + 1.0e-30);
double safeLength = sqrt(2 * gEpsilonDeflect / curv);

which we agree is wrong because it makes tiny steps for large direction[2].

You write in https://github.com/apt-sim/AdePT/issues/145#issuecomment-1032445431 that we should instead do

double dirXY = sqrt((1. - direction[2]) * (1. + direction[2]));
double invdirXY = 1. / (dirXY + 1.e-30);
// Curvature in the XY plane
double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * invdirXY / momentumMag;
double safeLengthD = invdirXY * sqrt(2 * gEpsilonDeflect / curv);

which I claim is equivalent to

double dirXY = sqrt((1. - direction[2]) * (1. + direction[2]));
// Curvature in the XY plane
double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * (dirXY + 1.e-30) / momentumMag;
double safeLengthD = sqrt(2 * gEpsilonDeflect / curv);

(note that I changed both curv and safeLengthD, getting rid of using dirXY twice). Do we agree on that?

If yes, then

Put differently, you're proposing to replace division by dirXY with a multiplication?

agheata commented 2 years ago

I probably did not get enough coffee today... I fail to see how in your translation of my comment 145 which reads

double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * invdirXY / momentumMag;

is equivalent to:

double curv = std::fabs(ConstBzFieldStepper::kB2C * charge * BzValue) * (dirXY + 1.e-30) / momentumMag;

I mean the final formula you get may be correct, but you are using a different formula for curv which does not represent a curvature.

hahnjo commented 2 years ago

The value of curv is not, but getting the right value of safeLength should be sufficient, no? Maybe I'm missing something here...

agheata commented 2 years ago

In the method as implemented now curv is not reused, so your replacement is sufficient (I didn't get this fast enough due to missing coffee...), but as soon as we will want to optimize based on the helix geometric properties, all results will be wrong. For example, to implement some cut we may want to use the total length of the helix done in one full turn: const double turn_step = 2. * M_PI * invdirXY / curv; and this will be wrong. So IMO we should not alter variables having a property that we might reuse.

hahnjo commented 2 years ago

Yes, that would need to become const double turn_step = 2. * M_PI * dirXY / curv;. As you write regarding stability, I think it's not desirable that the code you originally proposed multiplies very large values with very tiny values, only saved from 0 and infinity by the addition of 1e-30. The code I provided avoids this, but was mostly meant as an illustration what is effectively changed.

agheata commented 2 years ago

Sure we agree on that, the implementation should be both numerically stable and try to minimize the number of square roots and divisions. But we should try to put this in since as you correctly observed the version in the master is flawed.

hahnjo commented 2 years ago

Regarding

The really important and worrying issue is that particles can get stuck on boundaries during field propagation, if they curl such that they re-enter the volume that they just exited within one chord step.

I said yesterday that Geant4 doesn't have special code to handle this, and eventually field propagation is right here (both in Geant4 and AdePT): The case that I studied in detail curls such that it penetrates the next volume less than the miss distance. So it "ok" to relocate to the volume "behind" and then continue stepping from there as the end of the next chord step lies within. This relocation works flawlessly in Geant4, but seems to fail for this particular scenario in AdePT. However, that seems to be very rare (it only happens with my "fixes" to make the boundary crossing more correct, and only with the BVHNavigator, not the LoopNavigator...) and when (partly) fixing it by not pushing and considering Inside() != kOutside instead of Contains(), I see no change in the overall results with TestEm3. So my current conclusion would be that there is a potential problem, but yet again not the full explanation of the reduced energy deposit...

Edit: Or my partial fix for the problem isn't sufficient. I'm currently dumping all particles that are killed by the "looping" detection, and all of them are stuck on volume boundaries, with energies (maximum a few MeV, in many cases less) and directions (very low x component) that look exactly like they curl back into the volume they come from. To be continued...

hahnjo commented 2 years ago

Okay, I was confused yesterday because I did not reproduce the real problem, which involves physics. Here are two additional steps of a stuck electron, with additional verbosity output:

pos: (-5.700000, -0.826631, 0.754389)
dir: (0.000988, 0.127977, -0.991777)
volume ID: 75
geometricalStepLengthFromPhysics = 0.023182
chord end position: (-5.699987, -0.823664, 0.731398)
geometryStepLength = 0.000000
onBoundary: true, volume ID: 74
after relocation, volume ID: 76

step 2:
pos: (-5.700000, -0.826631, 0.754389)
dir: (0.000988, 0.127977, -0.991777)
volume ID: 76
geometricalStepLengthFromPhysics = 0.121848
chord end position: (-5.700163, -0.811038, 0.633543)
geometryStepLength = 0.000000
onBoundary: true, volume ID: 74
after relocation, volume ID: 75

Because the volumes are filled with different materials, the geometricalStepLengthFromPhysics is different and consequently the chord end positions in both cases end up in the "other" volume, making the navigator toggle between the two. I think that Geant4 remembers the last attempted chord step length to get the same chord end position in this case, but maybe @jonapost can confirm?

apt-sim / AdePT

Tracks get stuck on boundaries in field propagation #145