Two step gap clustering

feigaodm commented 7 years ago

In order not to split discrete small S2s from deep TPC signal, we designed a clustering algrithm to merge any nearby hits. When S1s are nearby some S2 signals, like events on the top TPC, or S1s are acompanied by PMT after pulses and/or gate photo-ionisation s2s, there is high chance that pax will merge those signals into a larger peak and miss the S1s we are interested.

This PR aims to solve this problem by applying the gap size clustering twice: 1, We first apply a tight gapsize clustering and only merge hits with gapsize smaller than 100ns (cut value tuned by seperation of S1s and gate photoionisation while keep single electron S2s not splited). We try to identify the S1 like signals and mark them differently. In the same sense, we isolate lone hit and small coincidence signals. All other signals are considered as S2 candidates. 2, For the remaining S2 candidate peaks, we apply a large gap size clustering and merge all nearby S2s. So the treatment of deep S2 signal remains un-changed.

Here is one example WF to validate this change.

1, S1 and photo-ionisation signals can be seperated clearly. s1_photo-ionisation 2, Deep S2s unaffected. screen shot 2017-07-14 at 11 48 18 am

We can probably tune the parameter to improve the performance further, so we'd like to merge it to pax_head for more checks.

Credit to (Joey, Jelle and Fei)

JelleAalbers commented 7 years ago

Nice, thanks for taking this up. Interesting that it finds an 'unknown' in the middle of an S2... I guess this looked sufficiently like an S1 to be split off. I suppose you'd want to run a few tests with peakfindertest before merging to quantify the performance (especially to show quantitatively that deep S2s are still ok). Maybe @jhowl01?

Also it might be worth checking it doesn't do weird things at very high energy, where S2s have very long 'single hits' in each channel.

feigaodm commented 7 years ago

Good point about the unknown peak in the middle of the S2s. I checked it in detail and it's because the first clustering's gaps size is a little small and then two hits from single electron S2s get splited from it. I guess I have to tune this parameter further .

screen shot 2017-07-14 at 12 16 01 pm

Not sure how to check high energy S2s yet, will simulate such WFs and see what happens:)

JosephJHowlett commented 7 years ago

@JelleAalbers @feigaodm I'm away until Tuesday and can run the pftest then, unless this is needed sooner

@feigaodm was this unknown called an "s2 candidate" at the classification stage? If so, shouldn't it already be merged on the second iteration? It doesn't look like it has a fast rise time to me, but maybe I'm not understanding.

feigaodm commented 7 years ago

In the update I also rejected lone hits or peak with 2 fold coincidence in the S2 clustering. So the split of single electron s2 make part of the signal unknown. I fixed this by putting a s1_gap_size=150ns, so single electrons are not splited in that WF.

feigaodm commented 7 years ago

This should fix Issue (#594, #540, #542, #543, #544, #545)

feigaodm commented 7 years ago

One of CU summer student (Malcolm Wells) here checked the effect on S1 peak finding efficiency using @JelleAalbers 's framework. The new twostepgap + classification/clustering has no observable impact on S1 efficiencies. The efficiency gets improved a little bit because the probability to be mis-identified as single electron S2s is smaller as the clustering is tighter. Here are two plots

current_pax_results 2step_pax_results current_pax_results-3

Also, in the update I only isolated small peaks after the first clustering, so it shouldn't affect high energy S2 signals at all. Any suggestions to test it before merging it?

JelleAalbers commented 7 years ago

Nice! But the main test is to quantify what happens to low-energy deep S2s, things like SmallS2s, SmallDeepS2s or TenElectronS2s (or a similar test) in PeakfinderTest.

JosephJHowlett commented 7 years ago

I finally ran a couple of the tests @JelleAalbers suggested with PeakFinderTest. Here I'll report the numerical outcomes where the efficiency distributions were flat.

SmallS2s

I limited this test to simulating single electrons. The master branch had two categories - those single electron S2s found as a single peak and those split to multiple S2s. Twostepgap did better in this regard, but many cases occurred where one or two hits were stripped off the end of the S2, and called a lone_hit or unknown. This is occurring because the first GapSizeClustering iteration is chopping up the S2, and only S2-like segments are considered for the second iteration. Here I show this effect as a function of the first gap size parameter.

$fraction_merged_as_par_varied$

SmallDeepS2s

Here I used the instructions as Jelle left them, simulating 1-100 electrons at a time, 95cm below the gate. Again, a lot of chopping unknowns and lone_hits off the end of signals occurred, but in terms of splitting the single S2 into multiple S2s, twostepgap did a bit better than pax. The question is whether we're OK with splitting off these 1-2 hit peaks.

I also found one event out of 2500 in which an S1 was split off from the signal. This was fixed by increasing the first max gap size to 250 ns from 200 ns.

Outcome	Master Branch	twostepgap	twostepgap (250ns)
found	98.64	82.68	89.56
split to s2s	1.36	0.76	0.80
split and misid as s1	0	0.04	0
chopped to unknown/lone_hit	0	16.52	9.64

S1S2Close

I also checked on merging of S1s and S2s at low drift times using PeakFinderTest. As expected, twostepgap improves on the current pax clustering at resolving these close-together peaks, although when the simulation is run with afterpulses on this improvement is less than expected. I'm looking into this - plots forthcoming.

sanderbreur commented 7 years ago

Hey Joey,

Nice work testing this.

Do I read correctly that both new algorithms chop 10-17% of real deep S2s into unknowns?

Because that would a big problem in my opinion. Due to the low elife/field combination losing deep low energy S2s would effect our dark matter search. How much I find hard to estimate.

S

Op vr 21 jul. 2017 om 23:28 schreef Joey Howlett notifications@github.com:

I finally ran a couple of the tests @JelleAalbers https://github.com/jelleaalbers suggested with PeakFinderTest. Here I'll report the numerical outcomes where the efficiency distributions were flat. SmallS2s

I limited this test to simulating single electrons. The master branch had two categories - those single electron S2s found as a single peak and those split to multiple S2s. Twostepgap did better in this regard, but many cases occurred where one or two hits were stripped off the end of the S2, and called a lone_hit or unknown. This is occurring because the first GapSizeClustering iteration is chopping up the S2, and only S2-like segments are considered for the second iteration. Here I show this effect as a function of the first gap size parameter.

[image: fraction_merged_as_par_varied] https://user-images.githubusercontent.com/16269427/28482807-1cf310fa-6e38-11e7-9439-1101dd413d30.png SmallDeepS2s

Here I used the instructions as Jelle left them, simulating 1-100 electrons at a time, 95cm below the gate. Again, a lot of chopping unknowns and lone_hits off the end of signals occurred, but in terms of splitting the single S2 into multiple S2s, twostepgap did a bit better than pax. The question is whether we're OK with splitting off these 1-2 hit peaks.

I also found one event out of 2500 in which an S1 was split off from the signal. This was fixed by increasing the first max gap size to 250 ns from 200 ns. Outcome Master Branch twostepgap twostepgap (250ns) found 98.64 82.68 89.56 split to s2s 1.36 0.76 0.80 split and misid as s1 0 0.04 0 chopped to unknown/lone_hit 0 16.52 9.64 S1S2Close

I also checked on merging of S1s and S2s at low drift times using PeakFinderTest. As expected, twostepgap improves on the current pax clustering at resolving these close-together peaks, although when the simulation is run with afterpulses on this improvement is less than expected. I'm looking into this - plots forthcoming.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/XENON1T/pax/pull/596#issuecomment-317116819, or mute the thread https://github.com/notifications/unsubscribe-auth/AGNstay6M97kR7qX2ldy0L90VDS_Ab_5ks5sQRgTgaJpZM4OYa0W .

JelleAalbers commented 7 years ago

Thanks for the tests Joey, looking forward to the plots. Is 'twostepgap' here without running the natural breaks & local minimum algorithms? I think we want them all on in the end, they have different specializations. For example local minimum is best for splitting high-energy S2s, and naturalbreaks should be good at close S1s.

@sanderbreur Perhaps it's a bit confusing, but 'chopped' means that the peak was found and correctly classified, but some part of it got chopped off and labeled unknown or lone hit. So it would mean introducing some more S2 nonlinearity. How serious that is depends on how large a part is split off.

JosephJHowlett commented 7 years ago

Hey guys,

@JelleAalbers yes, all plugins are run in the same way. "twostepgap" here just means switching between the master branch and the branch in this pull request, which only modifies the GapSizeClustering plugin and keeps all others.

@sanderbreur Thanks Jelle for clarifying, and I just want to add that in all cases Fei and I looked at, the "chopped" part was just 1-2 hits at the end of the peak. Still 10% is a big number (I think we're leaning toward 250ns over 200ns).

feigaodm commented 7 years ago

Hi @JelleAalbers and @sanderbreur ,

Thanks for the feedback! As @jhowl01 explained, the two step clustering is not as bad as the number shows. In the worse senerio, we will have some more S1s identified from single electron S2. But please not that this is at the order of ~1 out of 2000, probably smaller than the mis-identification of single electron s2s, so I don't think it's a big problem at all.

The reason why new cluster performs better in merging deep S2s is because we can not make the gap size for S2 like signals larger (2.5 us instead of 2 us). All other procedures like other clusters are not changed, so I don't expect change in high energy calibrations at all.

If we can seperate PMT after pulse from the S1s using this method, I suspect that the energy resolution will get improved a little bit (maybe ER band width as well). But we have to check the effect on calibration data.

Once this PR is merged, we can process some calibration (Rn220) and background data to see whether it helps. And I hope we can see some more top populations in the det.

JosephJHowlett commented 7 years ago

Hi all,

Basically I re-ran the simulation of 4000 S1s and S2s down to 1cm below the gate, with areas according to the ER band mean, below 2000pe in cS1. I did this with and without PMT afterpulses and photo-ionization, for both the master and the twostepgap branches. Shown are the fractions of S1s having each outcome after processing. The relevant quantity to merging of S1s and S2s is the percentage of S1s merged with another peak and classified as S2, which is shown in the figures.

$merging_fraction_ap_off$

$merging_fraction_ap_on$

Outcome	Master (off)	twostepgap (off)	Master (on)	twostepgap (on)
Found	98.80	98.58	71.25	77.05
Merged to S2	0.70	0.93	18.05	12.80
Mis-Ided as S2	0.50	0.42	9.53	9.03
Un-Classified	0.0	0.0	1.15	1.07

There is definitely some improvement in merging, although maybe less than we would expect. An example waveform where twostepgap still fails to resolve the peaks is shown below.
It also seems clear that this merging (as well as mid-identification) is dependent on the inclusion of afterpulses. The waveform below suggest this may be due to gate photo-ionization.

sanderbreur commented 7 years ago

Good example, yes without any after pulsing or photoionization (of the mesh) we would have expected splitting to happen more often.

I am slowly starting to wonder if our approach to distinguish between the low energy deep S2s and these shallow S1-S2 pairs should be changed. Can't we i.e. use the number of contributing PMT channels to our advantage here? It seems that timing just doesn't suffice and we all can split these events easily by eye. This cannot be said for the deep and shallow low energy S2s which do need the 'gentle' splitting.

And another option is to process these shallow events ourselves with just a very hard splitting algorithm. We wait for reprocessing to be done, exclude all normal events, just process the left over events (~300/run) with our own splitting algoritm, and add these events later to the full data of SR1. I am a bit scared about changing efficiencies etc here.

Any other ideas?

Op ma 24 jul. 2017 om 16:47 schreef Joey Howlett notifications@github.com:

Hi all,

Basically I re-ran the simulation of 4000 S1s and S2s down to 1cm below the gate, with areas according to the ER band mean, below 2000pe in cS1. I did this with and without PMT afterpulses and photo-ionization, for both the master and the twostepgap branches. Shown are the fractions of S1s having each outcome after processing. The relevant quantity to merging of S1s and S2s is the percentage of S1s merged with another peak and classified as S2, which is shown in the figures.

[image: merging_fraction_ap_off] https://user-images.githubusercontent.com/16269427/28528996-4ef32f6e-705d-11e7-801b-6ad062de986a.png

[image: merging_fraction_ap_on] https://user-images.githubusercontent.com/16269427/28529006-5458ea0c-705d-11e7-9fd5-150232f26520.png Outcome Master (off) twostepgap (off) Master (on) twostepgap (on) Found 98.80 98.58 71.25 77.05 Merged to S2 0.70 0.93 18.05 12.80 Mis-Ided as S2 0.50 0.42 9.53 9.03 Un-Classified 0.0 0.0 1.15 1.07

-

There is definitely some improvement in merging, although maybe less than we would expect. An example waveform where twostepgap still fails to resolve the peaks is shown below.

It also seems clear that this merging (as well as mid-identification) is dependent on the inclusion of afterpulses. The waveform below suggest this may be due to gate photo-ionization.

[image: screen shot 2017-07-24 at 10 02 31 am] https://user-images.githubusercontent.com/16269427/28528983-484f639e-705d-11e7-9941-86eacc06535c.png

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/XENON1T/pax/pull/596#issuecomment-317446342, or mute the thread https://github.com/notifications/unsubscribe-auth/AGNstWg1NzHnPIaLPZsnK8A0e5y12NS-ks5sRK5jgaJpZM4OYa0W .

JosephJHowlett commented 7 years ago

Hi all,

Fei and I were looking deeper into why twostepgap wasn't vastly improving the S1-S2 resolution, and found that the initial clustering was resolving these S1s, but they were often mis-classified as S2 candidates and subsequently merged with the main S2. Fei fixed this by updating the classification step to be identical to pax's usual classification (see most recent commit). See results below:

Updated S1S2Close Comparison

corrected_branch_comparison

The S1s and S2s are now almost always resolved. This should give us some new top events, or at the least kill our best guess at where they went.

Outcome	Master (APs on)	twostepgap (APs on)
Found	71.25	88.85
Merged to S2	18.05	0.15
Mis-Ided as S2	9.53	9.93
Un-Classified	1.15	1.05

sanderbreur commented 7 years ago

Wow this is a great improvement!

Do these two algorithms now also work equally good on the deep low energy S2s? Or is that still a problem?

Op di 25 jul. 2017 om 21:55 schreef Joey Howlett notifications@github.com:

Hi all,

Fei and I were looking deeper into why twostepgap wasn't vastly improving the S1-S2 resolution, and found that the initial clustering was resolving these S1s, but they were often mis-classified as S2 candidates and subsequently merged with the main S2. Fei fixed this by updating the classification step to be identical to pax's usual classification (see most recent commit). See results below: Updated S1S2Close Comparison

[image: corrected_branch_comparison] https://user-images.githubusercontent.com/16269427/28590699-17d13d4c-7151-11e7-8ffa-f94746028d26.png

The S1s and S2s are now almost always resolved. This should give us some new top events, or at the least kill our best guess at where they went.

Outcome Master (APs on) twostepgap (APs on) Found 71.25 88.85 Merged to S2 18.05 0.15 Mis-Ided as S2 9.53 9.93 Un-Classified 1.15 1.05

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/XENON1T/pax/pull/596#issuecomment-317853474, or mute the thread https://github.com/notifications/unsubscribe-auth/AGNstTUxOb78KNC9EoBXvTShJEzQWXc-ks5sRkg5gaJpZM4OYa0W .

JosephJHowlett commented 7 years ago

@sanderbreur you asked the right question, I just reprocessed the SmallDeepS2s data and it looks good:

Outcome	Master Branch	twostepgap
found	98.64	82.24
split to s2s	1.36	1.04
split and misid as s1	0	0.04
chopped to unknown/lone_hit	0	16.68

I think this is expected, since most of the deep-s2 fragments after the tight clustering have a small rise time and are merged by the second, loose clustering. Those with a short rise are usually lone hits (and 1-2% are two-fold coincidences).

JelleAalbers commented 7 years ago

That looks like a nice improvement in the merging, but you might have opened a can of worms here...

If BuildPeaks is after RejectNoiseHits, the latter doesn't do anything anymore (as it relies on a rough clustering being already applied), so you will have lost noisy channel mitigation. I think you also get into trouble if you would use the order [SumWaveform, BuildPeaks, RejectNoiseHits], I don't remember exactly why (maybe a rare crash case, or some properties end up inconsistent, but there's a comment in XENON1T.ini saying you can't do it :-), maybe there is a more detailed comment somewhere else).
There is now quite a bit of duplication of code between the classification / peak properties computation plugins and the clustering plugin -- though I see you've tried to import where you could -- which would make it harder to maintain and optimize. Also keep in mind pax isn't used only by XENON1T.

Are you sure you want this extra complication? As Sander mentioned if you just want to see the top population once we can process with different settings (eg. the old natural breaks settings) first. Or perhaps there is a smaller modification of the mini-classification that would get you most of the way there.

feigaodm commented 7 years ago

@JelleAalbers The problem is that the raw classification doesn't work nicely as shown in Joey's plot. This is because the rise time based on max_index and left (or left_central) has too big fluctuations.That's why we spent the time and effort to tune this. From the results, I think it's worth the effort and should be implemented in Xe1T. As I expected, I think it can not only solve the missing top population issue, but can also help to increase the S1 resolution (because we can get rid of PMT after pulses and gate PI).

I understand there could be some potential issues, but I think we should proceed with it and solve the coming issues. About the order of [SumWaveform, BuildPeaks, RejectNoiseHits], I did think about it and come with the solution as is. I changed the order because I met some problem when calculating peak.rise time using the peak. Maybe we can modify that part so that we don't need to call sumWF to calculate rise time, do you have any suggestions to do this? Or it would be good if you can modify it directly.

The simple approach @sanderbreur mentioned won't work so nicely without lots of effort. At the top part of the TPC, S1s won't be too different than S2. We tried to disable some other clustering algrithm to test the effect, it looks like effects of the other two are limited.

JosephJHowlett commented 7 years ago

@JelleAalbers maybe this is silly, but is it possible to split this new algorithm down the middle, do the initial GapSizeClustering with our small (200ns) threshold, then continue down the line as usual, but after SumWaveform add a classify step, then replace the later classification with an S2 merging within our larger threshold? I say this because we know the second GapSizeClustering will only merge S2s into larger S2s, so in principle the second classification step adds no new information. Or is there at least some other way to incorporate one or more plugins that retain the essence of this algorithm without hurting the flow of pax?

For sure Fei's and my immediate goal was to replace the current GapSizeClustering with a two-step algorithm with minimal changes that could resolve these kinds of events. We're testing some processing on background data now to see if new events appear, maybe this result can add to our discussion.

feigaodm commented 7 years ago

@JelleAalbers I modified the plugins and use the order [SumWaveform, BuildPeaks, RejectNoiseHits] in configuration files. The branch has been used to process ~17 hour of background data without bugs as far as I can tell, the results of the processed data looks promising in solving the missing top populations. Please let us know if you think we can test other topics. Thanks. btw, this update doesn't cost more time in processing data.

feigaodm commented 7 years ago

This fixed Issues (#594, #540, #542, #543, #544, #545)

XENON1T / pax