Open Hexlord opened 2 months ago
Hello.
I've faced several issues related exclusively to gcc (see cmake/jet_live_setup.cmake, there's -falign-functions=16
added specifically for gcc), but I've had no chance to investigate it deeper. I believe there's some gcc option that will make gcc behave in the same way as clang, so I would suggest to try to find one. Unfortunately I don't have time to investigate this myself, but I would be thankful if you share your findings here
And also, are you sure you've implemented ILiveListener::onLog
function? There could be some logs related to unknown relocations and other failures.
Talking about gcc flags I believe you need something that will change either symbols visibility or symbols placement (segment, section) or even both
Hello, yep I have noticed the gcc specifics, I dug in a little more,
g++ -g -fPIC -c main.cpp && g++ -g -shared -o main.so main.o && objdump -g main.so | grep St
for
struct St1 {
int field = 0;
};
St1 St1Instance;
namespace {
struct St2 {
int field = 0;
};
St2 St2Instance;
}
int main() {
return 0;
}
correctly returns the debug symbols
sasha@sasha:~/gcc_bugs/symbols$ g++ -g -fPIC -c main.cpp && g++ -g -shared -o main.so main.o && objdump -g main.so | grep St
<2f> DW_AT_name : St1
<4b> DW_AT_name : (indirect string, offset: 0xad): St1Instance
<66> DW_AT_name : St2
<7b> DW_AT_name : (indirect string, offset: 0x0): St2Instance
Line Number Statements:
0x00000000 53743249 6e737461 6e636500 474e5520 St2Instance.GNU
0x000000a0 6e006669 656c6400 6d61696e 00537431 n.field.main.St1
I am not sure if I am looking at the correct thing, as I have little idea about all these things, but I don't think it is a gcc regression after the mentioned commit
ILiveListener::onLog
is indeed implemented, it does spam some logs about dep files not being found, but it always did that (it does that for weird dependency headers), nothing new at the moment or after the hot reload of a problematic file (anonymous global variable).
I will try and add debug traps into jet-live to see if it ever finds my global symbol (I'll give it a unique name) when I get the time! Please do let me know if there are more ways I could debug this
You should look at symbols defined in .bss and .data sections, not debug symbols. For more context you can look into DefaultSymbolsFilter::shouldTransferElfSymbol
function. You can use objdump
or any other tool that shows the content of sections in ELF binaries
You should look at symbols defined in .bss and .data sections, not debug symbols.
Ah, right, g++ -g -fPIC -Wl,-export-dynamic -MD -c main.cpp && g++ -g -Wl,-export-dynamic -MD -shared -o main.so main.o && objdump -t main.so
yields about the same result as clang for following code (both globals in .bss):
Also as for logging - my bad, there actually were logs, I have not seen them because it takes another jetLive->update() to display them (because after the hot reload the logs of the hot reload are unprocessed events), but I could not previously reach that because of segfault immediately after hot reload, now I update() twice in a row, and see the logs, and there are a bunch of WTF-s:
I hacked in some code to try to look up the symbols in other sections:
And it seems like it did something (might be something wrong), but it was futile:
Lastly I tried this objdump -t /home/sasha/se/cmake-build-debug/Core/CMakeFiles/SECore.dir/Modules/Render/Render2D.cpp.o | grep StateRender2D
And noticed the abnormality (no .bss entry for my global variable in anonymous namespace):
0000000000000000 l F .text 00000000000000b7 _ZN2SE12_GLOBAL__N_120FRenderStateRender2DaSEOS1_
00000000000034e0 l F .text 00000000000000d4 _ZN2SE12_GLOBAL__N_120FRenderStateRender2DD2Ev
00000000000034e0 l F .text 00000000000000d4 _ZN2SE12_GLOBAL__N_120FRenderStateRender2DD1Ev
0000000000000000 l O .data 0000000000000058 _ZN2SE12_GLOBAL__N_113StateRender2DE
if we do the same for my other global variable that is outside of anonymouse namespace in the same file:
objdump -t /home/sasha/se/cmake-build-debug/Core/CMakeFiles/SECore.dir/Modules/Render/Render2D.cpp.o | grep IterationNodes
Then there is .bss entry for it:
0000000000000000 g O .bss 0000000000000010 _ZN2SE5Impll14IterationNodesE
So this could be a gcc issue where it does not export the global variable into .bss, however I have not been able to come up with minimal reproducable example (when I try to assemble one, anonymous namespace does get exported), it could potentially be a compiler bug, I shall investigate it further. Please note that moving it outside of anonymous namespace fixes the issue, even though it still does not appear in .bss:
objdump -t /home/sasha/se/cmake-build-debug/Core/CMakeFiles/SECore.dir/Modules/Render/Render2D.cpp.o | grep StateRender2D
0000000000000000 l d .text._ZN2SE5Impll20FRenderStateRender2DaSEOS1_ 0000000000000000 .text._ZN2SE5Impll20FRenderStateRender2DaSEOS1_
0000000000000000 l d .text._ZN2SE5Impll20FRenderStateRender2DD2Ev 0000000000000000 .text._ZN2SE5Impll20FRenderStateRender2DD2Ev
0000000000000000 l .group 0000000000000000 _ZN2SE5Impll20FRenderStateRender2DD5Ev
0000000000000000 w F .text._ZN2SE5Impll20FRenderStateRender2DaSEOS1_ 00000000000000b7 _ZN2SE5Impll20FRenderStateRender2DaSEOS1_
0000000000000000 w F .text._ZN2SE5Impll20FRenderStateRender2DD2Ev 00000000000000d4 _ZN2SE5Impll20FRenderStateRender2DD2Ev
0000000000000000 w F .text._ZN2SE5Impll20FRenderStateRender2DD2Ev 00000000000000d4 _ZN2SE5Impll20FRenderStateRender2DD1Ev
0000000000000000 g O .data 0000000000000058 _ZN2SE5Impll13StateRender2DE
However, the Jet-Live output changes from [JetLive Debug]: Done, relocated: 8/8 to [JetLive Debug]: Done, relocated: 0/0 - as if it no longer even tries to relocate (which might be correct, idk), and there is much more [JetLive Debug]: Relocation UNKNOWN is not possible in PIC code
spam as well:
This is objdump for clang file with struct in anonymous namespace (for clang it always works):
0000000000000000 l F .text 0000000000000305 _ZN2SE12_GLOBAL__N_120FRenderStateRender2DD2Ev
0000000000000000 l O .data 0000000000000058 _ZN2SE12_GLOBAL__N_113StateRender2DE
0000000000023280 l F .text 00000000000002eb _ZN2SE12_GLOBAL__N_120FRenderStateRender2DaSEOS1_
this is for clang with struct outside of anonymous namespace:
0000000000000000 l d .text._ZN2SE5Impll20FRenderStateRender2DD2Ev 0000000000000000 .text._ZN2SE5Impll20FRenderStateRender2DD2Ev
0000000000000000 l d .text._ZN2SE5Impll20FRenderStateRender2DaSEOS1_ 0000000000000000 .text._ZN2SE5Impll20FRenderStateRender2DaSEOS1_
0000000000000000 w F .text._ZN2SE5Impll20FRenderStateRender2DD2Ev 0000000000000305 _ZN2SE5Impll20FRenderStateRender2DD2Ev
0000000000000000 g O .data 0000000000000058 _ZN2SE5Impll13StateRender2DE
0000000000000000 w F .text._ZN2SE5Impll20FRenderStateRender2DaSEOS1_ 00000000000002eb _ZN2SE5Impll20FRenderStateRender2DaSEOS1_
Finally the code file in question, though it is of no interest
Thanks a lot for the heads up on .bss section! I also tried playing around with shouldTransferElfSymbol (changing it here and there), but to no avail, I think if we are not in the .bss section there is nothing we can do after that
Thank you for thorough investigation! I didn't touch this code for a couple of years already so I can miss something, but what I remember is the global variables (globals, statics, in and out of anonymous namespaces etc) can be located either in .bss or in .data sections. Most likely globals are put in .data and statics in .bss section, but I'm not sure how gcc treats globals in anon namespaces.
The last thing that you could try is to compile the same sample code with gcc and with clang and check how it looks like under objdump and what are differences. I'm almost sure that this is some specifics of gcc, but following the common sense I assume that if there's some global variable (specifically in anon namespace), it should be stored somewhere, and if gcc is not storing it neither in .data nor in .bss sections, it should store is somewhere else, and if you'll manage to find this place (section name), you can adjust the code of jet-live and make it work as it should.
And also you can try to run tests suite with both gcc and clang and check the difference between reports, this could probably help with investigations as well.
I'll let you know if I eventually come up with more ideas
In my case there are no static variables, unless they are qualified as such automatically when they are not getting used outside of CU, I have little idea, I will keep on the lookout for both sections then
I will take a loot at the tests suite! Thank you for the quick response
Yeah I have updated my comment, in fact both clang and gcc produce an equal .data symbol for the case of anonymous namespace:
0000000000000000 l O .data 0000000000000058 _ZN2SE12_GLOBAL__N_113StateRender2DE
So the problem must be somewhere else (like the section format / whatever)
What worries me is that with clang, for both the case of global inside anonymous space and outside of it, jet-live outputs
[JetLive Debug]: Done, relocated: 0/0
But for gcc, for the case of global inside anonymous space it outputs a different string (for the case outside of anon namespace the output is the same):
[JetLive Debug]: Done, relocated: 8/8
Given the .data symbol for the global variable is identical, it probably means that jet-live tries to relocate something that it should not relocate?
I will try and simplify the code file until there is nothing remaining but the repro, extract it into separate app and step through jet-live to determine exactly where the divergence happens, I think this will be the fastest way to figure it out without learning everything about elf format like you did when writing this very useful library ^^
it probably means that jet-live tries to relocate something that it should not relocate?
Probably yes, and probably there're some logic errors, so the minimal sample with error would be helpful here
Ok so not a real repro (it does not corrupt memory), but if you modify the example like this:
And output changes from
To
Then after hot-reload with gcc you will see 3x WTF outputs This makes me wonder if jet live confuses uses of anonymous global variables with their declarations, or something like that,
But after we add -mcmodel=large, we get
So basically setting this for the application fixes the issue
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcmodel=large")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcmodel=large")
I discovered that flag when I was debugging by clang issue with jet live long ago, I think it was a compile error of some sorts due to linking with big libraries So the fix is to apply it to gcc build as well, simple as that, even though I still have little understanding what exactly goes wrong Thanks again for the quick responses, I think I would have lost motivation fairly quickly in chasing that and would just live with it, if not for your interest! Hope everything is alright
I could make a pr for it if you wish, but I am not sure if this is a true fix, it could just happen to fix it for my use cases
That's interesting and could make sense. Thank you very much again, I'll try to investigate it more and probably create some tests that explicitly gets benefit from this flag
It seems that gcc, unlike clang, does not export symbols from anonymous namespaces with -Wl,-export-dynamic, or does it in a way that jet-live does not recognize
Would love to investigate what causes this, but there are no jet-live logs coming out related to this, so any pointers are appreciated!
Upd.: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31462 seems to be the old change in gcc that caused this