Mu2e / Offline

Offline software for the Mu2e experiment
Apache License 2.0
8 stars 81 forks source link

changed sort fnction implementation and added protections in the code #1262

Closed gianipez closed 4 months ago

gianipez commented 4 months ago

The fixes were tested on mu2ebuild01 running n>60 times the following command: mu2e -n 100 -c Production/Validation/nightly/ceMix_14.fcl

tagging: @rlcee @matthewstortini @kutschke

FNALbuild commented 4 months ago

Hi @gianipez, You have proposed changes to files in these packages:

which require these tests: build.

@Mu2e/fnalbuild-users, @Mu2e/write have access to CI actions on main.

:hourglass: The following tests have been triggered for 5cc51b37fdc3cb9845ba295c292c8f93976863f2: build (Build queue is empty)

About FNALbuild. Code review on Mu2e/Offline.

FNALbuild commented 4 months ago

:sunny: The build tests passed at 5cc51b37fdc3cb9845ba295c292c8f93976863f2.

Test Result Details
test with :white_check_mark: Command did not list any other PRs to include
merge :white_check_mark: Merged 5cc51b37fdc3cb9845ba295c292c8f93976863f2 at cdb6f2841aa95c752be8066862c80b2d69ecdf7e
build (prof) :white_check_mark: Log file. Build time: 21 min 47 sec
ceSimReco :white_check_mark: Log file.
g4test_03MT :white_check_mark: Log file.
transportOnly :white_check_mark: Log file.
POT :white_check_mark: Log file.
g4study :white_check_mark: Log file.
cosmicSimReco :white_check_mark: Log file.
cosmicOffSpill :white_check_mark: Log file.
ceSteps :white_check_mark: Log file.
ceDigi :white_check_mark: Log file.
muDauSteps :white_check_mark: Log file.
ceMix :white_check_mark: Log file.
rootOverlaps :white_check_mark: Log file.
g4surfaceCheck :white_check_mark: Log file.
FIXME, TODO :white_check_mark: TODO (0) FIXME (0) in 1 files
clang-tidy :large_orange_diamond: 0 errors 438 warnings
whitespace check :white_check_mark: no whitespace errors found

N.B. These results were obtained from a build of this Pull Request at 5cc51b37fdc3cb9845ba295c292c8f93976863f2 after being merged into the base branch at cdb6f2841aa95c752be8066862c80b2d69ecdf7e.

For more information, please check the job page here. Build artifacts are deleted after 5 days. If this is not desired, select Keep this build forever on the job page.

kutschke commented 4 months ago

I am not aware of any imminent plans for new PRs that will require ceMix validation to be working. So we can leave this open until Matthew is back and has time to complete the work. In addition to the requests on the review so far, I would like:

  1. Understand why the problem occurred. Without that understanding the PR might just be moving the problem around by reordering memory, not fixing it.
  2. Test using all of the 20 jobs in Production/Validation/nightly/ceMix_nn.fcl, for nn=00 ... 19.
kutschke commented 4 months ago

and a 3rd item. Requests that will come up during the ongoing review.

kutschke commented 4 months ago

i will look into if the reserve for _tcHits is necessary later and make another PR if it's not. for now, the changes Giani made fix the bug.

If adding this makes the problem go away, it is almost certainly just masking the true problem.

matthewstortini commented 4 months ago

I agree with Rob about understanding exactly why we were seeing the error we were seeing. I will be back Tuesday and can look into it more.

In the meantime, I guess there are a couple options. We could just leave the PR open. Or we could approve it in case another PR comes through that needs the ceMix validation to work properly. The former may be the cleaner way of doing things. The latter may be better in some ways given that I don't know how long it will take to understand the exact root of this issue.

Either way, I'll try to understand what is going on next week. What is done with this particular PR can be left up to the experts.

kutschke commented 4 months ago

We plan to leave the PR open. If there is a need to validate something ceMix related in the next few days we will deal with it when it comes up.

gianipez commented 4 months ago

Matthew will take over this fix in a separate PR. Closing this one