Dead cells and missing cathodes

cherylepatrick commented 5 years ago

I'm labelling this a question for now, but I suspect it's going to turn into a request.

We've found some shorts on the cathodes in the tracker, and I want to understand how the reconstruction will handle them. While we're at it, we should think about dead cells, too.

What I'd like to be able to do (but feel free to suggest alternatives) is:

Be able to configure a list of dead anodes and have the simulation not output any signals from them
Be able to configure a list of shorted cathodes and have the simulation not generate any signal on those cathodes (could be 1 or both cathodes in a cell)

We should be able to do some amount of reconstruction even without cathode signals, but we will need to be smart and I don't know if we are.

In the longer term, I think it's likely that we'll need to tell the reconstruction where our bad channels are so that the tracking/clustering algorithms know to treat those cells differently than the rest, but for now I just want to see what the simulation does and how well or otherwise the reconstruction handles it.

I guess a related question is whether anybody already has any idea what the reconstruction does with a dead cell and/or missing cathode signal! That's what I really want to know the answer to. Thanks!

yramachers commented 5 years ago

Dead anodes or cathodes would correspond to long-term run conditions and would be in the conditions DB, maybe as a dead readout channel configuration.

On the reconstruction: Each time the concept of a 'neighbour' is being used, a dead channel would be a problem. I don't know whether CAT uses that concept, I suspect it does. The simplest way around that challenge is to always(!) have dead channels set as active(fired geiger) and only indicate that these are dead with an additional flag (valid or dead). The drawback with that approach is if an algorithm uses more information from a cell such as the radius which doesn't exist for a dead cell.

The easiest way to see what existing reconstruction modules do with dead cells is to either use the ideal event generator rather than the full simulation and manually remove single cells from single lines or helices generated by the event generator. If you create ten single helices or lines you can remove random cells from each event while they are still in ROOT files and then translate them to brio files (filling the CD data bank) and run the reconstruction as normal on brio files.

For full simulation files, I guess the easiest way would be to make your own mockgeiger.. calibration module, and remove a random cell from simulated events before they are being put into the CD data bank.

drbenmorgan commented 5 years ago

Assigned to you both for further discussion and implementation. My own feeling is that the "filtering" approach works best since this is independent of the simulation and won't waste CPU cycles.

cherylepatrick commented 5 years ago

Thanks, Yorck! I'm going to add Matteo to this as I think it's a good project for him to investigate.

Using the ideal event generator sounds like a good idea. I haven't used it yet but maybe this is a good time to start - especially if it means we can "edit" events in ROOT. @yramachers - can you remind us where to find it/how to use it?

The radius measurements have uncertainties on them, so I wonder if we can do something for a dead channel like give it a huge (whole cell) uncertainty on the radius. For a double missing cathode, we might need a 100% uncertainty on the z position, though I think Dave said there was a way of getting some kind of cathode information from the anode signal, so maybe we can do better in the long term. Maybe if I look at the mock calibration, I can play with doing something like that.

yramachers commented 5 years ago

The ideal event generator is on the SuperNEMO-DBD page as SN-IEgenerator package. The Readme should explain how to work it.

Keeping a dead cell switched on artificially could mean to declare it as radius 0 and radius error as 22 mm. The zero radius should always be taken as minimum permitted, finite radius, and prevent any tangent point calculation with that error other than the whole cell becoming one tangent point with a big error. I think CAT uses tangent points but I don't know (was in Federico's thesis). If not then at least it will be seen as a neighbour and connections across the inactive cell become possible in order to form clusters.

The z-coordinate should not feature in neighbour searches since it is continuous rather than discrete (no grid). The safest way to artificially set it is to take the collection of neighbours and determine a mean z-value from them, an interpolation. The movement should be linear in z hence an interpolation is as good as it gets and should not introduce artificial jumps in z. That could for instance upset a cluster algorithm, who knows.

drbenmorgan commented 5 years ago

Pull Request #154 will cover the topic of "how to implement a service", which is the currently identified mechanism for transmitting dead cell info through Falaise.

There are a couple of additional implementation details to address:

As we can have "zombie" cells, we may need a way to identify these later in the pipeline. Could be an additional "isZombie" attribute/accessor in snedm::TrackerHit. Other cross-referencing mechanisms probably aren't easily supportable.
@yramachers highlights the aspect of a "RadiusPolicy", that is, how to define r/delta_r for a zombie hit.
Indirect: Work in #149 highlighted some missing types and interfaces for working with tracker cell coordinates and geom_ids. Not required for this work, but will make it easier to implement.

I'll also add @pfranchini to this discussion.

pfranchini commented 4 years ago

As suggested we are going to start with the idealized events generator, removing entire cells from the tracker and making disappear the correspondent hits of the generated tracks, as in the example

before after

Those simulation will be reconstructed with Falaise and in parallel Matteo will run his CNN on the same samples to evaluate the efficiency vs number of dead cells with the two different algorithms.

Next steps will be

moving to the full Falaise chain (removing hits from the simulated brio files)
evaluating the inefficiency related to losses in sole cathode information
having a Falaise mechanism for the dead cells (service, db, ...).

cherylepatrick commented 4 years ago

Wonderful! It would be good to get an idea as soon as possible of how efficiencies change with the SuperNEMO reconstruction if we remove all the cells that are currently shorting. I can point you at the list if you don't have it. Are we ready to do that now with the ideal event generator?

pfranchini commented 4 years ago

Nearly there, if you provide the list will do that early next week.

cherylepatrick commented 4 years ago

D'you need a particular format? I can give you an excel spreadsheet...

pfranchini commented 4 years ago

Right now is (side, layer, column). Thanks Cheryl!

pfranchini commented 4 years ago

Here a quick analysis using YR's generator:

have simulated 10k single tracks (lines as in toyillumination.py)
killed n random full cells in the tracker and removed the correspondent hits
reconstructed with Falaise (official-2.0.0.conf without the "Mock digitize and mock calibrate")
calculated the efficiency as number of single reconstructed tracks/number of generated tracks
(since the hits are randomly removed will need to run multiple killing procedure for each data point to assign a proper error)

We can see that, as already been shown by Matteo, the efficiency is pretty low even for super-idealized tracks. Killing hits easily tricks the reconstruction into geminating tracks. The low efficiency should not be an artefact of this analysis because even looking at some Falaise MC with the classical pipeline the efficiency (as defined above) is not any better. This study will be reproduced by Matteo using his CNN.

If you have the specific list of real problematic cells can evaluate that specific loss.

cherylepatrick commented 4 years ago

Will get you that list of cells later today.

I notice that with up to about 500 cells the efficiency is actually going up... but I am a bit concerned that that might be due to the definition of efficiency. If you've simulated two tracks, kill a cell in the middle of one of them, and then count the tracks, you might well end up with 3 tracks (1 for the one you didn't split, and two for the one track with a hole in the middle that has now become two tracks). Would be interesting to see what Falaise comes up with in terms of "particles".

pfranchini commented 4 years ago

The errors might be quite large, it probably goes down within the error (it might have randomly killed hits that do not affect many tracks). I agree with you, once I kill one hit or two Falaise reconstructs two tracks and I suppose that should not be the case, but probably the process is more sophisticate than this. Will try to think in terms of particles.

pfranchini commented 4 years ago

Would be interesting to see what Falaise comes up with in terms of "particles".

One original track (four hits killed), five particles reconstructed.

cherylepatrick commented 4 years ago

Yikes, that's not so good.

yramachers commented 4 years ago

Looks to me that the algorithm does what it is supposed to do. I know it is some time ago, see comment from July above in the thread, but this is an effect of the reconstruction relying on nearest neighbours to form clusters. Therefore, a dead cell should be set as active by default rather than switched off and carry a flag as a dead cell and be assigned a default radius zero and maximum error (for the track fit stage, if at all required for fitting), see discussion from the summer in this thread. The alternative would be to process reconstruction results in a module that can merge broken structures if permissible into, finally, particles. Here, such a module would merge the five clusters after the fitting by recognizing that they all five describe the same structure, i.e. a line with consistent slopes and intercepts within errors. I never got 'round writing that module but in principle would need it by construction for use with the image segmentation clustering algorithm.

cherylepatrick commented 4 years ago

I agree with @yramachers - but I am curious to see whether Matteo's machine learning thing can figure this out for itself. I'd be very impressed if it could.

pfranchini commented 4 years ago

I do agree. A dead cell is not the same as a cell without a hit. In principle the algorithm could try firing up dead cells to see if by any chance this makes collinear tracks into a single one.

I think we agree that the nearest neighbours concept is a bit too short sight. Having just the second nearest neighbours will reabsorb most of the dead cell's artificial features.

Still the current algorithm is giving 80% for pure lines without any dead cell (whatever it could be defined) so there is some room for improvements also at this level.

Since I do not really know how to tackle that problem, will start with something I know like setting radius to zero in hits correspondent to dead cell.

cherylepatrick commented 4 years ago

Hi @pfranchini - I've updated http://nile.hep.utexas.edu/cgi-bin/DocDB/ut-nemo/private/ShowDocument?docid=4943 to show which cells have any kind of short (top, bottom or both). The cells you need are in the blue section at the bottom of each sheet (one sheet per C section). Let me know if it isn't clear.

pfranchini commented 4 years ago

Thanks Cheryl.

pfranchini commented 4 years ago

list_of_shorts.xlsx

pfranchini commented 4 years ago

I had to reconsider my definition of efficiency as single tracks that have a hit on the foil and one on the calorimeter too, since this is how they are simulated by YR generator. If I keep all the reconstructed single tracks, former tracks that in absence of dead cells are split in two, might become a single track after the loss of hits due to having dead cells. This would artificially enhance the efficiency.

So, in absence of dead cells

Efficiency: 77.87 %
Zero tracks: 0.07 %
More than one track: 22.06 %

Killing all the hits correspondent to cells with shorts (367)

Efficiency: 75.24 %
Zero tracks: 3.49 %
More than one track: 21.27 %

in comparison a randomized suppression of 367 cells would give

Efficiency: 72.99 ± 1.20
Zero tracks: 1.28 ± 0.32
More than one track: 22.48 ± 1.13

In case we had a perfect track reconstruction that would stitch together all the broken tracks the only loss would come from the tracks not reconstructed at all, so if I have to quote a single number I would say that the loss of efficiency in having all those shorts as full dead cells would be ~3.4% that does not sound too bad in the scenario of 18% of the tracker off. I am pretty sure this argument does translate in term of number of reconstructed particles.

I suppose some effort should go in improving the algorithm itself since the dead cells seems of a second order of importance, but this is beyond this issue.

cherylepatrick commented 4 years ago

Hi Paolo, I'm pretty surprised by this and I'm wondering if I understand correctly. Are you saying that all those dead cells don't make a difference and that almost all of the tracks reconstruct back to a vertex on the foil, despite a pretty large percentage of the first layer of cells being dead? Can you explain what you did to get it to do that? It doesn't seem consistent with what you showed before with the 1 track (w dead cells) being split into 5. Sorry if I am being slow here.

cherylepatrick commented 4 years ago

Or are you marking dead cells now as always on rather than always off?

pfranchini commented 4 years ago

Cells are off and correspondent hits are removed. Is basically the green curve above and we already start for a 78%. I guess the reconstruction is clever enough to get back to the foil even if cells are missed on the first layer, but for some other tracks we still get more than one track for 20% of the time (as in the event display example). Apparently if we randomly kill 3/4 of the detector we still get 30% of straight ideal tracks reconstructed. Probably less if want a full track foil to calo that was not considered before (should redo the plot).

pfranchini commented 4 years ago

Of course Falaise does not know a thing about dead cells in the geometry.

For counting the tracks I am using the reco.track_count from the Sensitivity module that might be source of uncertainty too... need to look at it.

cherylepatrick commented 4 years ago

The reconstruction isn't necessarily being clever if it goes back to the foil with the first layer missing! What if it was a radon-in-the-tracker event from Bi214 on the tracker wires? It can't just go assuming it's from the foil. It's doing TOO well and I am suspicious!

pfranchini commented 4 years ago

You are right to be. The SensitivityModule sometimes has reco_vertices_on_foil==1 even if is not apparently true. Need to do more debugging here or just parse the brio file.

pfranchini commented 4 years ago

@cherylepatrick, was apparently done on purpose, but is not clear to me why:

trackDetails.cpp

 if (track_.has_vertices()) // There isn't any time ordering to the vertices so check them all                                                                             
  {
    for (unsigned int iVertex=0; iVertex<track_.get_vertices().size();++iVertex)
    {
      const geomtools::blur_spot & vertex = track_.get_vertices().at(iVertex).get();
      if (snemo::datamodel::particle_track::vertex_is_on_source_foil(vertex) || snemo::datamodel::particle_track::vertex_is_on_wire(vertex) )
      {
        vertexOnFoil_ = true; // On wire OR foil - just not calo to calo gammas                                                                                             
      }
    }
  }

cherylepatrick commented 4 years ago

Huh, why on earth did I do that? I clearly did it intentionally. But I cannot at all remember why. My guess is that it was something to do with the possibility of a track not being extended back to the foil, and that we looked at that in a different way. But I have to admit it makes very little sense to me right now. As far as I am concerned, you can feel free to change it - just check where it gets used and that there's no obvious place where it needs to allow the other. To be honest I am baffled why I thought this was a smart idea.

cherylepatrick commented 4 years ago

Actually @pfranchini be careful before doing that - it looks like it is used in the gamma track length/internal external probabilities for the 1e1gamma channel to check when you have a gamma and an electron from a common vertex, and that needs to include a common vertex on the wires. I'm not saying that we don't want to make a change, but I think the change needs to be more sophisticated than just changing the definition of that variable. Probably we need a separate check for an actual vertex on the foil, and that var should be renamed vertexInTracker or something like that.

pfranchini commented 4 years ago

I think the definition is correct but just the name is misleading, in particular as defined in the README (or at least was for me). I would expect the new reco.vertices_on_foil (maybe a reco.foil_hit_count) to count all the possible vertices without any topology assumption; same for reco.calorimeter_hit_count.

cherylepatrick commented 4 years ago

Yeah, I think I am inclined to agree. The readme definitely seems misleading. Sorry about that.

pfranchini commented 4 years ago

No worries, thanks for that. I might have a look since for these sort of basic checks can be useful.

cherylepatrick commented 4 years ago

Yeah it would be really helpful actually, there is a lot of stuff in there that hasn't had much scrutiny from anyone except me, and I haven't looked at it in a while... I suspect there are a few things that were weirdly specific to analyses that were in progress at the time I wrote it, and actually doesn't make much sense in a more general context. Another pair of expert eyes would be a big help.

pfranchini commented 4 years ago

After the fix in the variable I was using, I still consider my definition of efficiency as single tracks that have a hit on the foil and one on the calorimeter too, since this is how they are simulated by YR's generator. Otherwise they are classified as short even if reconstructed as single tracks.

So, in absence of dead cells:

Efficiency: 76.27 %
Zero tracks: 0.07 %
More than one track: 21.98 %
Short tracks: 1.68 %

Killing all the hits correspondent to cells with shorts (367): tracker_DC

Efficiency: 43.57 %
Zero tracks: 3.49 %
More than one track: 16.93 %
Short tracks: 36.01 %

in comparison a randomized suppression of 367 cells would give: tracker_DC_random

Efficiency: 62.20 ± 2.06
Zero tracks: 1.21 ± 0.37
More than one track: 22.65 ± 1.25
Short tracks: 13.94 ± 1.94

As, @cherylepatrick, you were expecting removing the hits near the foil would compromise the efficiency, as here defined, by ~31%. Of course any further analysis or better reconstruction could easily overcome this.

pfranchini commented 4 years ago

efficiency

cherylepatrick commented 4 years ago

Hi @pfranchini and thanks for doing this, it's very interesting. I have a few questions:

Does the yellow line mean that it reconstructed a single particle track, but either the foil hit, calorimeter hit, or both were missing? If the calorimeter hit isn't associated, it's a lost cause, but if it's just not marked as a foil hit, there are things we can do, so it's interesting to know exactly what happened with those.

Have you seen any evidence of a track being reconstructed and successfully "jumping" a dead cell?

Finally - this is with the dead cells just reporting no hit? How hard is it to make them always on, and what difference would that make? (I assume that's not TOO bad with the IE generator, but we might need help to integrate it into Falaise?)

Another good thing to test is how well we would be able to reconstruct two tracks from a common vertex (bb-like topology). At some point we need to care about BiPo events as well, as we will lose a lot of background-suppression power from the alpha veto.

Do we have a plan for testing with Falaise? What do we need to do start doing that?

pfranchini commented 4 years ago

Does the yellow line mean that it reconstructed a single particle track, but either the foil hit, calorimeter hit, or both were missing? If the calorimeter hit isn't associated, it's a lost cause, but if it's just not marked as a foil hit, there are things we can do, so it's interesting to know exactly what happened with those.

In this case all the time is a missing foil hit, since there is always a colorimeter hit from the generator. Of course if Falaise would knew that extrapolating the track up to the foil there was a dead cell in the way could have assumed that the vertex was on the foil. This requires some intelligences in the reconstruction.

Have you seen any evidence of a track being reconstructed and successfully "jumping" a dead cell?

Yes, sometimes. I have tried to kill a full layer of 113 cells in both the trackers. This splits all the tracks in two but still there are many tracks being reconstructed from foil to calo (as can easily confirm with flvisualize)

Efficiency: 68.7 % (<-- foil to calo full track)
Zero tracks: 0.9 %
More than one track: 29.2 %
Short tracks: 1.2 %

Finally - this is with the dead cells just reporting no hit? How hard is it to make them always on, and what difference would that make? (I assume that's not TOO bad with the IE generator, but we might need help to integrate it into Falaise?)

Need some coding for this with IE gen and my cell killer. Would you do, as suggested, setting always hits with radius=0?

Another good thing to test is how well we would be able to reconstruct two tracks from a common vertex (bb-like topology). At some point we need to care about BiPo events as well, as we will lose a lot of background-suppression power from the alpha veto.

The IE generator should be able to do this. We probably should move at this point into a topology reconstruction efficiency rather than counting tracks. I guess can use your Sensitivity module to do this.

Do we have a plan for testing with Falaise? What do we need to do start doing that?

a module can remove hits from the simulation and this would be the very equivalent of what we are doing now and can serve for the studies in the meantime
need to find the courage to face the Falaise service as @drbenmorgan suggested: did we agree how we want this done? I assume even more courage is required to wrestle with the reconstruction to make it absorb the concept of dead cell (e.g. jumping over a dead cell to find a neighbour)...

pfranchini commented 4 years ago

Finally - this is with the dead cells just reporting no hit? How hard is it to make them always on, and what difference would that make? (I assume that's not TOO bad with the IE generator, but we might need help to integrate it into Falaise?)

Even all hits with zero radius can make into a track

so of course if we keep the hit correspondent to dead cell we rescue some efficiency loss efficiency_radius

in term of the tracker shorts we get

Efficiency: 66.1 %
Zero tracks: 0.6 %
More than one track: 28.5 %
Short tracks: 4.8 %

cherylepatrick commented 4 years ago

That's great! But I'm actually kind of surprised that we don't get a lot of reconstructed tracks connecting the dead cells. Maybe the (bizarre) "striped" nature of the shorts is doing us a favour because they don't connect up. Are you allowing x-wall / gamma veto calorimeter hits? When it's finding a track, is it the RIGHT track? It'd be good to see some event displays just to check it's doing what we hope it's doing.

pfranchini commented 4 years ago

The study leaving on ALL the dead cells is in progress. Above are only the cells corresponding to real tracks left on with null radius (so random connections between dead cells could not happen). I guess that could be compatible with having only a partial cell reading.

Need a bit more of fiddling to find out, as you said, if the RIGHT track is reconstructed each time.

cherylepatrick commented 4 years ago

Ah, I see what you mean! In that case, I'm not surprised it made things better. I'm really curious how it will be with all the dead cells on.

By the partial cell reading, d'you mean no cathode, but the anode is OK? If we only have 1 cathode, we should be able to reconstruct a position just fine (don't know if Falaise currently does that though). If we have no cathodes, we can reconstruct a radius but not a z position (along the wires) - or rather, we get the z position to a choice of two locations, symmetrical about the middle of the detector. But from what Dave and Xavier found out in the testing, it sounds like maybe it isn't safe to turn on the cells at all if they have a cathode short, so I think for now we can just assume the whole cell is busted.

(And that's important actually - what are you doing about the z position for these always-on cells? I'm not sure CAT is great at utilising the z information but we ought to check. Maybe put it halfway up the detector with a +/-1.5 metre uncertainty? Come to think of that... are we better doing the same for the radii and putting them at 11mm (half a radius, if I remember the cell size correctly) with a 11mm uncertainty?)

pfranchini commented 4 years ago

@yramachers I am being lazy now, but do you have a simple set of equations to convert (side, row, column) into actual positions (wirex, wirey). I just wanted to be fully consistent with your generator. Thanks.

pfranchini commented 4 years ago

I got this that should work:

(+/-(grid_layer*d+offsetx), grid_column*d+offsety)

yramachers commented 4 years ago

Yes, that is what is being used in the generator. There the offset is offsetx=53.0 # [mm]for wire plane 0 closest to foil and offsety=-2464.0 # [mm] for bottom wire row with the origin sitting at the centre of the demonstrator

cherylepatrick commented 4 years ago

Completely coincidentally, I was looking through some old emails and found something I had written suggesting that for delayed hits, Xavier did what I mentioned above - set the radius to half the cell radius with an uncertainty of the same amount.

pfranchini commented 4 years ago

This makes absolutely statistically sense so the relative error is 100%.

pfranchini commented 4 years ago

The efficiency for V shaped double tracks (originated from one single foil vertex, both ending up in a calorimeter module) is around 35.3 % since a lot got split as expected

pfranchini commented 4 years ago

As proposed in the Analysis meeting of Feb. 27, I have created two Falaise modules

https://github.com/SuperNEMO-DBD/SuperNEMO-DeadCellsModule https://github.com/pfranchini/SuperNEMO-LazarusModule

The first, given a calibrated data file, kills a set of hits correspondent to dead cells (randomly generated or read from a file). The second one pretends to resuscitate "some" hits according to different methods, reading a defined list of dead cells

ALL hits correspondent to dead cells, or
hits correspondent to dead cells NEAR a real hits (assigning the z position and uncertainty correspondent to the neighbour), or
hits correspondent to dead cells BETWEEN two real hits that are not adiacents to each other (assigning the z position and uncertainty correspondent to the average of the neighbours).

I have now been testing the BETWEEN method on 2nbb (se82_2nubb) and I see a marginal effect in particular for high (>1000) number of dead cells. I see that the other methods create even more confusion for an already very poor reconstruction (it starts with a nominal efficiency ~30%...).

(My definition of efficiency correspond to the fraction of reconstructed double tracks - not even considering the signs).

Killing hits --> resurrecting

SuperNEMO-DBD / Falaise

Dead cells and missing cathodes #142