SegmentLinking / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1 stars 2 forks source link

Extensive Code Duplication #96

Open GNiendorf opened 1 week ago

GNiendorf commented 1 week ago

To simplify the code, we should remove the extensive use of code duplication (copy-paste).

Here is a gist showing some of the current duplication, automatically generated with PMD. Not all of these are meaningful duplications, but it catches many obvious repeats:

https://gist.github.com/GNiendorf/fd4fe952f38149b5bfd1269a7039370e

This was generated with the following:

1. Download PMD

wget https://github.com/pmd/pmd/releases/download/pmd_releases%2F7.5.0/pmd-dist-7.5.0-bin.zip

2. Extract the PMD archive

unzip pmd-dist-7.5.0-bin.zip -d pmd-dist

3. Navigate to the PMD bin directory

cd pmd-dist/pmd-bin-7.5.0/bin

4. Create a list of .cc and .h files in specific directories

find \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/interface/alpaka \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/interface \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/src/alpaka \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/src \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/standalone/bin \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/standalone/code/core \ CMSSW_14_1_0_pre3/src/RecoTracker/LSTCore/standalone/efficiency/src \ -type f ( -name ".cc" -o -name ".h" ) > file_list.txt

5. Run CPD to generate the duplication report

./pmd cpd --minimum-tokens 100 --file-list file_list.txt --language cpp --format text > duplication_report.txt

GNiendorf commented 1 week ago

Some examples below from the report:

Constants repeated in Segment.h

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Segment.h#L205-L213

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Segment.h#L235-L243

GNiendorf commented 1 week ago

Duplicate functions in Quintuplet.h and PixelQuintuplet.h https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Quintuplet.h#L1015-L1099

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelQuintuplet.h#L430-L514

GNiendorf commented 1 week ago

Triple duplication of chi-squared function. @YonsiG do you know if there is supposed to be any difference between these three, or can they be merged?

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelQuintuplet.h#L385-L428

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Quintuplet.h#L1200-L1243

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelTriplet.h#L325-L368

GNiendorf commented 1 week ago

runDeltaBetaIterations duplication.

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelTriplet.h#L1074-L1178

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Quintuplet.h#L1245-L1349

GNiendorf commented 1 week ago

See PR #97

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelQuintuplet.h#L852-L878

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Quintuplet.h#L532-L565

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelTriplet.h#L726-L754

GNiendorf commented 1 week ago

PR #98 addresses some of the above, with a 60% speedup on GPU for MD creation.

YonsiG commented 1 week ago

Triple duplication of chi-squared function. @YonsiG do you know if there is supposed to be any difference between these three, or can they be merged?

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelQuintuplet.h#L385-L428

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/Quintuplet.h#L1200-L1243

https://github.com/SegmentLinking/cmssw/blob/ff8446c8a375a0a535e627ad007d330950f49abb/RecoTracker/LSTCore/src/alpaka/PixelTriplet.h#L325-L368

Hi Gavin, for these chi-square function, they are doing chi2 based on x-y plane and radius calculations. I think those can be merged, just the input number of MDs need to be changed.