ANNIEsoft / ToolAnalysis

Other
10 stars 53 forks source link

CNNImage PMT mapping by lookup #255

Closed s4294967296 closed 4 months ago

s4294967296 commented 8 months ago

Improve consistency of the CNNImage tool by adding a "Static" mode that positions PMTs based on a static mapping / lookup table.

"Static" is now the preferred mode to use the tool with. In the previous (and currently still maintained) "PMT-Wise" mode, the position of the PMTs in the resulting CNNImage is re-calculated at each execution of the event loop. This method is susceptible to minor changes in the PMT positions, see for example the correction factors in the new CNNImage::OrderPMTPositions method. The step toward a fixed mapping will therefore increase stability and grant continuity.

As a further step, some code cleanup has begun and compartmentalization of the code into smaller methods is also under way. The goal of adding further methods is to decrease clutter and increase readability.

The README file and config files of respectively the UserTools/CNNImage and configfiles/CNNImage has been changed to reflect some of the changes.

The mapping is generated / explained with a few plots in the added python notebook, which uses the PMTPositions_Scan.txt. The lines within those plots indicate the binning borders.

s4294967296 commented 8 months ago

Relevant commit / changes: 9717d1e

marc1uk commented 8 months ago

hi Steven. From what I can see there are two things this PR does wrt the CNNImage Tool. One I agree with, the other I don't.

  1. The current implementation gets the PMT positions from the DataModel ANNIEGeometry, which I believe is the correct place to get all detector related information. Your PR replaces that with a local config file. If we start having each Tool read PMT information from separate config files, that redundancy quickly leads to divergence where different Tools are using different PMT geometries, so I'm afraid I don't agree with that.
  2. The current implementation then finds the corresponding bin entry by re-running the conversion from PMT position floats to CNN image position, and then uses FindBin to map to a histogram bin. That seems pretty silly, and your method of just keeping a map of detectorkey to histogram bin is the correct fix.

Would you be able to do some merge of the two, using the ANNIEGeometry to calculate the mapping based on the DataModel, but also recording the detectorkey->histogram bin map, so that it doesn't need to be re-run for every hit on every execute call?

While you're at it, I do have some questions about that old method. For example; why is the std::erase(std::unique(...)) needed? Surelyt there shouldn't be multiple PMTs at the same x and y locations? I'm also never a fan of floating point checks with an arbitrary tolerance like if (y_pmt[detkey] >= max_y-0.001). That's currently being used to identify Top/Bottom/Barrel PMTs, but that information is directly available from the Detector object via GetTankLocation*.

The OD skip check could also be done in the first for loop where information is copied out of the ANNIEGeometry class, rather than in the second one. In fact for that matter, copying the information out of the ANNIEGeometry class into a set of arrays is also redundant - why not just merge the loops and use the data where it is....? :neutral_face: I never understand why people do that.

Unrelated, but while you're at it there's a lot of new and delete calls in Execute. Histograms can be Reset' - there's generally no need to delete and re-create them.

Let me know what you think. Marcus

s4294967296 commented 8 months ago

Hello Marcus,

thank you for your thorough response, you raise very valid points.

First, let me state why this (or a similar) change to a lookup table for the PMT positions -> 2D matrix mapping is necessary, and let me give some background to the different mapping modes.

In general, this change is motivated by the tool breaking when used with newer WCSim files. I think what is happening is that this PR #197 updated the PMT positions, which very slightly differ from the PMT positions used in WCSim (see here). For example, see the first coordinate of PMT 332:

ToolAnalysis:
332,332,Bottom,0,-0.6434134,-1.6939623,2.00051707,0,1,0,LUX,E2,ON,6,1,6,0,0,2,11,1,-999,-999,-999,-9999,ok

WCSim:
332 0   -64.341342  -169.39623  200.051707  0   1   0.00E+00    0

I have calculated the average (absolute) difference of the first PMT coordinate to be about 1.124...e-06 cm. I assume this difference emerges from having one set of data points in units of cm, the other set in units of m. This has led the CNNImage tool to output different mappings, depending on what kinds of MC / Data are used.

In general, the following requirements should be fulfilled by the "PMT-wise" mapping mode:

  1. It should be the same for MC / Data
  2. Every PMT should correspond to exactly 1 position in the 2D mapping
  3. The PMTs should be aligned corresponding to their position within the tank, with the tank caps being compressed into the 2 upper- and lowermost rows, respectively. The top/bottom PMTs are a bit special here, since they are on the same "layer", but should be filled into two rows at the top and bottom of the CNNImage, with the "inner ring" of PMTs being placed in the uppermose and lowermost rows
  4. The mapping should be consistent and independent of subtle changes (like the unit conversion mentioned above) to the geometry files
  5. The mapping should be as small as possible, while having each vertical "layer" of PMTs fill a new row in the resulting CNNImage

So far, the "Geometric" mapping mode fulfills points 1, 3, 4 well, however fails at 2+5 (the "Geometric" mapping bins all PMTs according to their polar angle and vertical position); depending on how small the output size is chosen, many PMTs will overlap, especially the more crowded PMTs at the tank top and bottom. To have no overlapping PMTs, a rather large mapping has to be chosen (greater than 20x20), which introduces a lot of pixels that are always stuck at 0 in the resulting CNNImage, and which will introduce many 0-rows. For reference: the current CNN models developed by me (and the inherited work by Michael N. and David M.) use a 16x10 matrix.

The "PMT-wise" mapping mode aims to fulfill all those points, by calculating the unique polar angle/vertical position bins for all PMTs. This however is flawed. As you have correctly pointed out, position conversions such as if (y_pmt[detkey] >= max_y-0.001) or if (fabs(x-0.34)<0.001) x = 0.32 etc. are poor style and a direct result of PMTs, that are supposed to share some coordinate, slightly differing. For example, see how the y positions of the 20 bottom PMTs before PR #197 (ToolAnalysis) used to all be equal to -1.77609, but have been changed to now to each have one of the following values: -1.6939623, -1.6852488, -1.6906278, -1.6956896, -1.6986855, -1.6849436.

I also agree that it is not ideal to have this static mapping of PMTs. I personally would also much prefer a way of calculating the positions from the geometry files that is consistent, easily readable, does not use arbitrary floating point operations, and is consistent across MC / Data. The way it is handled now, introducing a change to the PMT positions in either Data or MC will break the tool again, and its code will have to be edited, again. Which is essentially the same as changing the new PMT mapping files. (Maybe even just hard-coding the PMT mappings into the CNNImage.cpp or CNNImage.h source code files would be a better choice in this specific case, so the configs are consistent across all ToolChains?)

However I am not experienced with the way we handle the geometry files. This is just what I could piece together from my very limited knowledge. I would be very grateful for any input on how to handle this in a consistent matter.

I am currently working on cleaning up the rest of the code as well. I am also somewhat confused by some design choices in the code. The erase, delete and "double mapping" calls you mention are probably redundant.

Given that the tool is currently completely broken, I would still be in favour of merging this PR, and additionally, opening an Issue tracker to keep track of progress. At least then the tool is usable again, for further development. I plan on integrating the output of the CNNImage in a boost store, so that it can soon be used with the CNN RingCounting, a (future) CNN particle ID (e-/mu-), and eventually their classification outputs with the EventSelector tool.

Thank you! -Daniel

marc1uk commented 4 months ago

Alright, so #197 was about putting the positions of the PMT based on real-world laser scan results, rather than idealised MC positions. I don't think we're going to revert that, we should correctly map the positions as best we can.

As you have correctly pointed out, position conversions such as if (y_pmt[detkey] >= max_y-0.001) or if (fabs(x-0.34)<0.001) x = 0.32 etc. are poor style and a direct result of PMTs, that are supposed to share some coordinate, slightly differing.

I think this is a bit of an unrelated point actually - but i don't think i agree those kinds of checks are a result of the positions not being ideal. Even if they were all identical, the equivalent check would be

if(y_pmt[detkey] == -1.77609){ ... }

or whatever, and i still wouldn't like it. For identifying which PMTs are in the caps and barrels, those are clear-cut categorizations in the PMT class that can easily be checked without magic numbers or arbitrary tolerances:

if(anniegeom->GetDetector(detkey)->GetDetectorLocation()=="TopCap"){ ... }

That won't be affected by slight (or large) changes to PMT positions.

Besides, I think if you're reducing the PMT positions to a grid of 16x10, I feel like it should be possible to derive some binning method that absorbs those difference - that's pretty darn coarse.

And I think that's somewhat the key point here. I don't pretend to fully understand what you're doing, but it seems like your CNN is using a grid of PMT psuedo-positions, derived from the real positions but with some mapping to compress it down to a few pixels without overlap or too much whitespace. Previously some code was written that produced that grid from an actual map of PMT positions, but it relied on specifics of the PMT positions that you didn't want reflected in the output. When the PMT positions slightly changed, the transformation basically broke. The fact is, your CNN doesn't really care about the actual PMT positions, it just cares about that reduced grid. So, I would say:

  1. ideal option is of course to write a more robust transformation function that takes the actual PMT positions and outputs a suitable grid. It would be insensitive to changes that you don't care about, such as those introduced by PR#197, but would produce a suitable mapping should something more drastic change. That may be a bit of unnecessary work, so a quicker alternative would be...
  2. if you're going to read anything from a config file, make it the CNN grid. Otherwise you're just passing in indirect, misleading information and doing unnecessary work transforming that into something else.
  3. Perhaps an intermediate goal that might be easier than writing an arbitrary function to generate a suitable grid from a PMT position map would be to check that the passed CNN grid is "suitable" for the PMT position map in the datamodel. Trying to protect against people not updating this config file and then applying a CNN grid to a modified ANNIE phase VII with a different PMT layout for which it doesn't work.... dunno how feasible that is, but sanity checks are always good.

I'm not familiar enough with the code to know how practical any of this is. At the end of the day, my personal concerns are just that:

  1. it doesn't break outside code
  2. preferably, try not to be confusing, misleading, or redundant (e.g. adding a second PMT position map that isn't the true one). If it meets those criteria, even if 2. is just met by a suitable comment in the code/README, i'm happy to merge it.
s4294967296 commented 4 months ago

The reason the mapping completely broke in the first place by such minuscule coordinate changes is that the "PMT-wise" mapping mode assigns each unique combination of (physical) x/y/z coordinates an unique coordinate in the 2D-mapping space (I don't think this was entirely clear from the stuff I outlined above..).

I have now done away with the external config files, see PR #260, and in general made some more improvements. The binning now only relies on the following hard-coded bin values (see CNNImage::StaticPMTBinning):

const std::vector<double> vertical_bins = {-180.0, -160.0, -125.0, -75.0, -30.0, 10.0, 30, 75.0, 130.0};
std::vector<double> phi_bins;
for (int i = 0; i < 17; i++) {
  phi_bins.push_back(i * 2 * TMath::Pi() / 16); //16 bins for 16 PMTs next to eachother
}

I think PR #260 now fulfills all the requirements, and is a better solution overall. Thanks again for the feedback, please let me know what you think.