Closed s4294967296 closed 4 months ago
Relevant commit / changes: 9717d1e
hi Steven. From what I can see there are two things this PR does wrt the CNNImage Tool. One I agree with, the other I don't.
Would you be able to do some merge of the two, using the ANNIEGeometry to calculate the mapping based on the DataModel, but also recording the detectorkey->histogram bin map, so that it doesn't need to be re-run for every hit on every execute call?
While you're at it, I do have some questions about that old method. For example; why is the std::erase(std::unique(...))
needed? Surelyt there shouldn't be multiple PMTs at the same x and y locations?
I'm also never a fan of floating point checks with an arbitrary tolerance like if (y_pmt[detkey] >= max_y-0.001)
. That's currently being used to identify Top/Bottom/Barrel PMTs, but that information is directly available from the Detector object via GetTankLocation*.
The OD skip check could also be done in the first for loop where information is copied out of the ANNIEGeometry class, rather than in the second one. In fact for that matter, copying the information out of the ANNIEGeometry class into a set of arrays is also redundant - why not just merge the loops and use the data where it is....? :neutral_face: I never understand why people do that.
Unrelated, but while you're at it there's a lot of new
and delete
calls in Execute. Histograms can be Reset
' - there's generally no need to delete and re-create them.
Let me know what you think. Marcus
Hello Marcus,
thank you for your thorough response, you raise very valid points.
First, let me state why this (or a similar) change to a lookup table for the PMT positions -> 2D matrix mapping is necessary, and let me give some background to the different mapping modes.
In general, this change is motivated by the tool breaking when used with newer WCSim files. I think what is happening is that this PR #197 updated the PMT positions, which very slightly differ from the PMT positions used in WCSim (see here). For example, see the first coordinate of PMT 332:
ToolAnalysis:
332,332,Bottom,0,-0.6434134,-1.6939623,2.00051707,0,1,0,LUX,E2,ON,6,1,6,0,0,2,11,1,-999,-999,-999,-9999,ok
WCSim:
332 0 -64.341342 -169.39623 200.051707 0 1 0.00E+00 0
I have calculated the average (absolute) difference of the first PMT coordinate to be about 1.124...e-06 cm. I assume this difference emerges from having one set of data points in units of cm, the other set in units of m. This has led the CNNImage tool to output different mappings, depending on what kinds of MC / Data are used.
In general, the following requirements should be fulfilled by the "PMT-wise" mapping mode:
So far, the "Geometric" mapping mode fulfills points 1, 3, 4 well, however fails at 2+5 (the "Geometric" mapping bins all PMTs according to their polar angle and vertical position); depending on how small the output size is chosen, many PMTs will overlap, especially the more crowded PMTs at the tank top and bottom. To have no overlapping PMTs, a rather large mapping has to be chosen (greater than 20x20), which introduces a lot of pixels that are always stuck at 0 in the resulting CNNImage, and which will introduce many 0-rows. For reference: the current CNN models developed by me (and the inherited work by Michael N. and David M.) use a 16x10 matrix.
The "PMT-wise" mapping mode aims to fulfill all those points, by calculating the unique polar angle/vertical position bins for all PMTs. This however is flawed. As you have correctly pointed out, position conversions such as if (y_pmt[detkey] >= max_y-0.001)
or if (fabs(x-0.34)<0.001) x = 0.32
etc. are poor style and a direct result of PMTs, that are supposed to share some coordinate, slightly differing. For example, see how the y positions of the 20 bottom PMTs before PR #197 (ToolAnalysis) used to all be equal to -1.77609, but have been changed to now to each have one of the following values: -1.6939623, -1.6852488, -1.6906278, -1.6956896, -1.6986855, -1.6849436.
I also agree that it is not ideal to have this static mapping of PMTs. I personally would also much prefer a way of calculating the positions from the geometry files that is consistent, easily readable, does not use arbitrary floating point operations, and is consistent across MC / Data. The way it is handled now, introducing a change to the PMT positions in either Data or MC will break the tool again, and its code will have to be edited, again. Which is essentially the same as changing the new PMT mapping files. (Maybe even just hard-coding the PMT mappings into the CNNImage.cpp or CNNImage.h source code files would be a better choice in this specific case, so the configs are consistent across all ToolChains?)
However I am not experienced with the way we handle the geometry files. This is just what I could piece together from my very limited knowledge. I would be very grateful for any input on how to handle this in a consistent matter.
I am currently working on cleaning up the rest of the code as well. I am also somewhat confused by some design choices in the code. The erase, delete and "double mapping" calls you mention are probably redundant.
Given that the tool is currently completely broken, I would still be in favour of merging this PR, and additionally, opening an Issue tracker to keep track of progress. At least then the tool is usable again, for further development. I plan on integrating the output of the CNNImage in a boost store, so that it can soon be used with the CNN RingCounting, a (future) CNN particle ID (e-/mu-), and eventually their classification outputs with the EventSelector tool.
Thank you! -Daniel
Alright, so #197 was about putting the positions of the PMT based on real-world laser scan results, rather than idealised MC positions. I don't think we're going to revert that, we should correctly map the positions as best we can.
As you have correctly pointed out, position conversions such as if (y_pmt[detkey] >= max_y-0.001) or if (fabs(x-0.34)<0.001) x = 0.32 etc. are poor style and a direct result of PMTs, that are supposed to share some coordinate, slightly differing.
I think this is a bit of an unrelated point actually - but i don't think i agree those kinds of checks are a result of the positions not being ideal. Even if they were all identical, the equivalent check would be
if(y_pmt[detkey] == -1.77609){ ... }
or whatever, and i still wouldn't like it. For identifying which PMTs are in the caps and barrels, those are clear-cut categorizations in the PMT class that can easily be checked without magic numbers or arbitrary tolerances:
if(anniegeom->GetDetector(detkey)->GetDetectorLocation()=="TopCap"){ ... }
That won't be affected by slight (or large) changes to PMT positions.
Besides, I think if you're reducing the PMT positions to a grid of 16x10, I feel like it should be possible to derive some binning method that absorbs those difference - that's pretty darn coarse.
And I think that's somewhat the key point here. I don't pretend to fully understand what you're doing, but it seems like your CNN is using a grid of PMT psuedo-positions, derived from the real positions but with some mapping to compress it down to a few pixels without overlap or too much whitespace. Previously some code was written that produced that grid from an actual map of PMT positions, but it relied on specifics of the PMT positions that you didn't want reflected in the output. When the PMT positions slightly changed, the transformation basically broke. The fact is, your CNN doesn't really care about the actual PMT positions, it just cares about that reduced grid. So, I would say:
I'm not familiar enough with the code to know how practical any of this is. At the end of the day, my personal concerns are just that:
The reason the mapping completely broke in the first place by such minuscule coordinate changes is that the "PMT-wise" mapping mode assigns each unique combination of (physical) x/y/z coordinates an unique coordinate in the 2D-mapping space (I don't think this was entirely clear from the stuff I outlined above..).
I have now done away with the external config files, see PR #260, and in general made some more improvements. The binning now only relies on the following hard-coded bin values (see CNNImage::StaticPMTBinning):
const std::vector<double> vertical_bins = {-180.0, -160.0, -125.0, -75.0, -30.0, 10.0, 30, 75.0, 130.0};
std::vector<double> phi_bins;
for (int i = 0; i < 17; i++) {
phi_bins.push_back(i * 2 * TMath::Pi() / 16); //16 bins for 16 PMTs next to eachother
}
I think PR #260 now fulfills all the requirements, and is a better solution overall. Thanks again for the feedback, please let me know what you think.
Improve consistency of the CNNImage tool by adding a "Static" mode that positions PMTs based on a static mapping / lookup table.
"Static" is now the preferred mode to use the tool with. In the previous (and currently still maintained) "PMT-Wise" mode, the position of the PMTs in the resulting CNNImage is re-calculated at each execution of the event loop. This method is susceptible to minor changes in the PMT positions, see for example the correction factors in the new CNNImage::OrderPMTPositions method. The step toward a fixed mapping will therefore increase stability and grant continuity.
As a further step, some code cleanup has begun and compartmentalization of the code into smaller methods is also under way. The goal of adding further methods is to decrease clutter and increase readability.
The README file and config files of respectively the UserTools/CNNImage and configfiles/CNNImage has been changed to reflect some of the changes.
The mapping is generated / explained with a few plots in the added python notebook, which uses the PMTPositions_Scan.txt. The lines within those plots indicate the binning borders.