Closed Dr15Jones closed 5 years ago
A new Issue was created by @Dr15Jones Chris Jones.
@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
assign dqm
New categories assigned: dqm
@jfernan2,@andrius-k,@schneiml,@kmaeshima you have been requested to review this Pull request/Issue and eventually sign? Thanks
Hi @Dr15Jones, which IB is crashing and could you please provide a link to the Jenkins page or any reference that we could look into?
The failure has appeared in several CSSW_10_4_ROOT6_X builds, the latest of which was CMSSW_10_4_X_2018-12-10-2300 which can be seen from the IB dashboard
The 137.8 is the new multi-run harvesting Workflow. Good to see that it caught something. My crystal ball guess is something that was illegal/does not make sense even in the current production release, and was caught now, probably in some subsystem module that does not expect multi-run harvesting to happen.
However, I can't reproduce that currently; ROOT crashes on initialization on lxplus7 in the CMSSW_10_4_ROOT6_X_2018-12-10-2300
IB:
cmsRun: /build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc700/lcg/root/6.15.01/root-6.15.01/interpreter/llvm/src/tools/clang/include/clang/Serialization/Module.h:72: clang::serialization::InputFile::InputFile(const clang::FileEntry*, bool, bool): Assertion `!(isOverridden && isOutOfDate) && "an overridden cannot be out-of-date"' failed.
Any hints?
@pcanal any idea what this failure could be?
/build/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/slc7_amd64_gcc700/lcg/root/6.15.01/root-6.15.01/interpreter/llvm/src/tools/clang/include/clang/Serialization/Module.h:72: clang::serialization::InputFile::InputFile(const clang::FileEntry*, bool, bool): Assertion `!(isOverridden && isOutOfDate) && "an overridden cannot be out-of-date"' failed.
@smuzaffar what is the environment you use to run the ROOT6 IBs?
@Dr15Jones sorta. This indicates that some of the headers files that are part of the ROOT pch files have been updated since ROOT was build. i.e. likely some system headers.
@Dr15Jones , we use docker to build run ROOT6 IBs. All of 10.4.X Ibs now run under docker (as nearly all of them are slc7 based).
@pcanal @smuzaffar This seems extremely bad. This is implying that ROOT6 master can only run on a machine on which it was compiled. We need to determine what differences between the docker container and lxplus7 are causing the problem and make ROOT not care about them (since such diffferences are bound to happen on grid sites as well).
adding @yamaguchi1024 as we talked of similar issues a week ago in the context of root modules..
On Dec 14, 2018, at 8:29 AM, Chris Jones notifications@github.com wrote:
@pcanal @smuzaffar This seems extremely bad. This is implying that ROOT6 master can only run on a machine on which it was compiled. We need to determine what differences between the docker container and lxplus7 are causing the problem and make ROOT not care about them (since such diffferences are bound to happen on grid sites as well).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
we have opened a JIRA ticket https://sft.its.cern.ch/jira/browse/ROOT-9843 about this. On lxplus7 it is due to glibc version update. But our docker container we still have old glibc
[cmsbuild@050d8f6b1a80 build]$ rpm -qa | grep glibc
glibc-devel-2.17-222.el7.x86_64
glibc-common-2.17-222.el7.x86_64
glibc-2.17-222.el7.x86_64
glibc-headers-2.17-222.el7.x86_64
Shahzad, could you bisect?
Is it possible that we build ROOT on a system with glibc version 2.17-260.el7 and the then deploy it on 2.17-222.el7 or vice versa?
If that’s the case I’d expect that the error is unclear but correct. ROOT would store a zip of the header files of glibc and then find out that some of them have changed.
@vgvassilev , as we run under docker so this is not possible then we pick up different glic version. Anyway, yesterday we updated root master to bcd447b (commits from 18th DEC) also we use -DLLVM_BUILD_TYPE=Release
but this workflow/test still fails with same error. Both root, and cmssw build was done (under docker) on the same machine where the test was run. So no chance that glic version could have changed.
Hi all,
I asked Axel about this histogram merge issue with a link to this test failure, and got the following reply:
Yes, but that's fairly old on ROOT's side, this was changed months ago. It's when having histograms with text labels, i.e.{"cat1": 12, "cat2": 13} We can merge two of these histograms, by simply creating the super-set of the bin labels, and then adding the values for each label. But in ROOT it's allowed to have {"cat1": 12, "cat1": 13}, i.e. repeated bin labels. And merging that will be - weird; we will be combining these labels, and likely that's not what the user expected, because they created two bins with the same label. So they need to think to do a conscious decision. I.e. this is not a bug in ROOT; this is likely a design issue on their side, with whomever fills that histogram.
I think Lorenzo and Axel are the responsive people for Histogram, so you can discuss this issue with them. Let me know if I can help also.
+1
Fixed by the PR above
This issue is fully signed and ready to be closed.
IN the ROOT6 IB, we are periodically seeing workflows (e.g. 137.8) failing in the DQM Harvest step from a new ROOT error message