SegmentLinking / TrackLooper

Apache License 2.0
5 stars 14 forks source link

Move Maps to CSV Based Method #374

Closed GNiendorf closed 8 months ago

GNiendorf commented 8 months ago

This PR is associated with the corresponding PR on LSTGeometry: https://github.com/SegmentLinking/LSTGeometry/pull/10

  1. Deleted all files in the current /data/ directory.
  2. Made a new README in the /data/ directory, and put the new module maps in the /OT800_IT615_pt0.8/ directory in /data/.
  3. Moved any imports in the code (in trkCore.cc and LST.cc) to the new map's directory.
  4. Improved the loading functions in TiltedGeometry.cc and EndcapGeometry.cc by getting rid of variables which were imported but never used, and made the import functions more robust to prevent similar issues described in #375.
  5. Got rid of the if statement below, since I checked and confirmed it is never used and therefore not needed anymore (i.e. All endcap detid's are present in the endcap map now, as they should be). https://github.com/SegmentLinking/TrackLooper/blob/1b7eccfb36338dd2766e51c001482a3a9ce827d9/SDL/Hit.h#L256-L258
  6. Updated pix_tot and endcap_size in Constants.h as required by the new maps. endcap_size went down by 1 because in the previous maps there was a comment with the column names at the top of the file being incorrectly loaded in as actual data. pix_tot changed slightly because of new CSV based geometry leading to slightly different pixel maps.
  7. Made naming improvement changes by changing the badly name "slope" variables to "dxdy" or something similar just like we have "drdz" variables.
GNiendorf commented 8 months ago

/run standalone

github-actions[bot] commented 8 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

GNiendorf commented 8 months ago

I think there is an error on the master branch. The code expects that the centroid values are attached to the endcap orientation file, see below.

https://github.com/SegmentLinking/TrackLooper/blob/78d8d24c796d798353e17be68ee01a2cecbf15a0/SDL/EndcapGeometry.cc#L51-L62

But if you look at the file the master branch is using,

https://github.com/SegmentLinking/TrackLooper/blob/78d8d24c796d798353e17be68ee01a2cecbf15a0/SDL/LST.cc#L18

https://github.com/SegmentLinking/TrackLooper/blob/78d8d24c796d798353e17be68ee01a2cecbf15a0/data/endcap_orientation_data_CMSSW_12_2_0_pre2.txt#L1-L3

It doesn't have the centroid values appended and so it is filling those values with 0's. The centroid phi's are used in the endcap map, so those values are used by the algorithm. @VourMa I'll fix this in this PR.

GNiendorf commented 8 months ago

/run cmssw

github-actions[bot] commented 8 months ago

There was a problem while building and running with CMSSW. The logs can be found here.

GNiendorf commented 8 months ago

There was a problem while building and running with CMSSW. The logs can be found here.

Looks like the master branch failed? @ariostas

ariostas commented 8 months ago

Huh that's weird. I'm just gonna rerun the job and see if it works.

github-actions[bot] commented 8 months ago

There was a problem while building and running with CMSSW. The logs can be found here.

ariostas commented 8 months ago

I'm not sure what's happening. I'll look into it.

ariostas commented 8 months ago

Oh it's because since https://github.com/SegmentLinking/cmssw/pull/16 CMSSW uses the LST headers, including Constants.h. So, since some numbers changed there, it would have to be recompiled. I'll implement that since it might be good to always recompile the LST part between PRs and master runs. I'll let you know when it's done.

slava77 commented 8 months ago

was there a good reason to put new txt files in data/output ? why not directly in data/ ? output is not a particularly meaningful additional layer.

slava77 commented 8 months ago

was there a good reason to put new txt files in data/output ? why not directly in data/ ? output is not a particularly meaningful additional layer.

also, with 2M lines proposed to be added, perhaps it's time to think of migrating to binary files from plain text for numbers.

GNiendorf commented 8 months ago

was there a good reason to put new txt files in data/output ? why not directly in data/ ? output is not a particularly meaningful additional layer.

This and other stuff will get cleaned up in the next commit. Still a draft PR for now.

GNiendorf commented 8 months ago

All CSV, This PR: Here All CSV, but with old endcap file (i.e. with bug above): Here Comparison plots: Here

I'm just showing in the plots linked above that almost all of the performance differences we see are coming from fixing the bug in #375. The difference in switching from hit-based to CSV is small and only present in a couple of bins, as I've shown before.

GNiendorf commented 8 months ago

I removed all of the old geometry files in the /data/ folder and moved the new CSV files from /output/ to /OT800_IT615_pt0.8/ with a new README file in /data/.

I'll look at @slava77's request now of moving from .txt files to binary files.

github-actions[bot] commented 8 months ago

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

GNiendorf commented 8 months ago

/run standalone

github-actions[bot] commented 8 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

GNiendorf commented 8 months ago

/run standalone

github-actions[bot] commented 8 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

ariostas commented 8 months ago

/run cmssw CMSSW_13_3_0_pre3_LST_X

Just making sure that the CMSSW workflow works well for 100 events.

Also, just leaving an extra reminder to squash and merge since there are some very large intermediate diffs.

github-actions[bot] commented 8 months ago

The PR was built and ran successfully with CMSSW. Here are some plots.

OOTB All Tracks
Efficiency and fake rate vs pT, eta, and phi

The full set of validation and comparison plots can be found here.

GNiendorf commented 8 months ago

/run cmssw CMSSW_13_3_0_pre3_LST_X

Just making sure that the CMSSW workflow works well for 100 events.

Also, just leaving an extra reminder to squash and merge since there are some very large intermediate diffs.

Did you get rid of the comparison to master? Or is that in a different command.

github-actions[bot] commented 8 months ago

There was a problem while building and running with CMSSW. The logs can be found here.

ariostas commented 8 months ago

Did you get rid of the comparison to master? Or is that in a different command.

Now everything is switched to CMSSW_14_1_0_pre0, so I had to specify the older branch CMSSW_13_3_0_pre3_LST_X. When running with a non-default branch it doesn't run comparisons with master since there might be many things that changed so pretty much everything would have to be done again.

VourMa commented 8 months ago

@GNiendorf While I am starting to look into the PR, could you please update the PR description with the actual changes that go into this PR? It is a bit hard to understand when there are 53 files changed and issues and fixes are mentioned in a incoherent way across O(30) comments. Thank you.

GNiendorf commented 8 months ago

Summary of changes @VourMa:

  1. Deleted all files in the current /data/ directory.
  2. Made a new README in the /data/ directory, and put the new module maps in the /OT800_IT615_pt0.8/ directory in /data/.
  3. Moved any imports in the code (in trkCore.cc and LST.cc) to the new map's directory.
  4. Improved the loading functions in TiltedGeometry.cc and EndcapGeometry.cc by getting rid of variables which were imported but never used, and made the import functions more robust to prevent similar issues described in #375.
  5. Got rid of the if statement below, since I checked and confirmed it is never used and therefore not needed anymore (i.e. All endcap detid's are present in the endcap map now, as they should be). https://github.com/SegmentLinking/TrackLooper/blob/1b7eccfb36338dd2766e51c001482a3a9ce827d9/SDL/Hit.h#L256-L258
  6. Updated pix_tot and endcap_size in Constants.h as required by the new maps. endcap_size went down by 1 because in the previous maps there was a comment with the column names at the top of the file being incorrectly loaded in as actual data. pix_tot changed slightly because of new CSV based geometry leading to slightly different pixel maps.
  7. Made naming improvement changes by changing the badly name "slope" variables to "dxdy" or something similar just like we have "drdz" variables.
GNiendorf commented 8 months ago

Thanks, this looks good as well. Apart from the comments, I have a couple of general questions:

  1. What is the plan on replacing the .txt with binary files? Is it something for this PR?
  2. My very naive expectation would be no changes in physics performance, so I can't say it is clear to me why validation plots show differences. Could you please explain?

We talked about binary files in one of the weekly meetings and came to the conclusion that it should be left for a different PR in the future. Also for number 2, the performance changes are coming from the bug I fixed in #375, where the endcap maps expected that the centroid phi position was being loaded but it wasn't, so those values were all being set to 0's.

GNiendorf commented 8 months ago

/run standalone /run cmssw

github-actions[bot] commented 8 months ago

The PR was built and ran successfully in standalone mode. Here are some of the comparison plots.

Efficiency vs pT comparison Efficiency vs eta comparison
Fake rate vs pT comparison Fake rate vs eta comparison
Duplicate rate vs pT comparison Duplicate rate vs eta comparison

The full set of validation and comparison plots can be found here.

github-actions[bot] commented 8 months ago

There was a problem while building and running with CMSSW. The logs can be found here.

github-actions[bot] commented 8 months ago

There was a problem while building and running with CMSSW. The logs can be found here.

ariostas commented 8 months ago

A fatal system signal has occurred: segmentation violation ./run.sh: line 76: 4133 Segmentation fault (core dumped) cmsRun step3_RAW2DIGI_RECO_VALIDATION_DQM.py

I'm not sure why there's a segfault. I'll take a look.

GNiendorf commented 8 months ago

@VourMa I thought there was another commit I had to make, but it looks like this PR is ready to be merged in as well (unless I missed something).