DFHack / dfhack

Memory hacking library for Dwarf Fortress and a set of tools that use it
Other
1.88k stars 478 forks source link

recent changes (to gcc?) are making some plugins ununloadable on linux #4301

Open ab9rf opened 9 months ago

ab9rf commented 9 months ago

During the course of developing the port of suspendmanager to C++ @chdoc ran into difficulty reloading the development build of the plugin from a running DF using dfhack's reload command. Further investigation determined that this is because gcc is tagging various symbols in the built plugin with the STB_GNU_UNIQUE tag, which triggers libdl to mark the DLL as ununloadable.

The reason why gcc does this are complicated. but experimentation shows that adding --fno-gnu-unique to the gcc compile flags for that plugin prevents gcc from adding that tag.

Examination of the most recent CI build shows that the buildingplan, autoclothing and debug have symbols with the STB_GNU_UNIQUE tag set. It may be appropriate to set this compilation option for buildingplan and autoclothing but it is my expectation that setting it on debug will cause debug to fail.

The solution for debug may be to move the implementation of DFHack::DebugManager::getInstance() from DebugManager.h to DebugManager.cpp and to make the static instance variable a local variable in the implementation of that function, the way it is for Core::getInstance(). This should remove the reference to a static variable in an inline method that causes gcc to force the use of STB_GNU_UNIQUE (by making the method in question not inline).

In general, any function that needs to reference a class-level static variable must not be declared inline in the header file that defines that class; doing so will cause any plugin that uses that header file to be ununloadable. For translation units that include g++ standard classes that use inline references to class-level statics in their definitions, the --fno-gnu-unique flag can be added to the compilation options for that plugin, but only after verifying (using nm -C) that all of the symbols marked as such in the resulting plugins are either const data or functions whose implementation does not depend on static data local to the library unit, for which aliasing will be harmless. If the symbol points to non-const data or to a function whose implementation in the library unit is in any way different from that in dfhack's core, --fno-gnu-unique cannot be used.

chdoc commented 9 months ago

This problem has the potential to become more severe in the future. When compiling DFHack on gcc-13.2 (specifically version 13.2.1_p20240113-r1 on Gentoo) there are a lot more plugins that have STB_GNU_UNIQUE symbols in them and are therefore tagged NODELTE:

$ for plug in *.plug.so; do if nm $plug | grep -q " u "; then echo "$plug (NODELETE)"; else echo $plug; fi; done
3dveins.plug.so (NODELETE)
RemoteFortressReader.plug.so (NODELETE)
add-spatter.plug.so (NODELETE)
autobutcher.plug.so (NODELETE)
autochop.plug.so (NODELETE)
autoclothing.plug.so (NODELETE)
autodump.plug.so (NODELETE)
autofarm.plug.so (NODELETE)
autolabor.plug.so (NODELETE)
autonestbox.plug.so
autoslab.plug.so (NODELETE)
blueprint.plug.so (NODELETE)
buildingplan.plug.so (NODELETE)
buildprobe.plug.so
burrow.plug.so (NODELETE)
changeitem.plug.so
changelayer.plug.so
changevein.plug.so
check-structures-sanity.plug.so (NODELETE)
cleanconst.plug.so
cleaners.plug.so
cleanowned.plug.so
color-dfhack-text.plug.so
counters.plug.so
createitem.plug.so
cursecheck.plug.so
cxxrandom.plug.so (NODELETE)
debug.plug.so (NODELETE)
deramp.plug.so
design.plug.so (NODELETE)
dig-now.plug.so (NODELETE)
dig.plug.so
dumpmats.plug.so
dwarfvet.plug.so
eventExample.plug.so
eventful.plug.so (NODELETE)
fastdwarf.plug.so
faststart.plug.so
filltraffic.plug.so
flows.plug.so
frozen.plug.so
getplants.plug.so
hotkeys.plug.so (NODELETE)
kittens.plug.so (NODELETE)
lair.plug.so
liquids.plug.so
logistics.plug.so (NODELETE)
luasocket.plug.so (NODELETE)
memview.plug.so
misery.plug.so
nestboxes.plug.so
onceExample.plug.so
orders.plug.so (NODELETE)
overlay.plug.so
pathable.plug.so
preserve-tombs.plug.so
probe.plug.so
prospector.plug.so (NODELETE)
ref-index.plug.so
regrass.plug.so
reveal.plug.so
seedwatch.plug.so (NODELETE)
showmood.plug.so
sort.plug.so
spectate.plug.so
stockcheck.plug.so
stockpiles.plug.so (NODELETE)
stocks.plug.so
strangemood.plug.so
stripcaged.plug.so
suspendmanager+.plug.so
tailor.plug.so (NODELETE)
tilesieve.plug.so
tiletypes.plug.so
tubefill.plug.so
tweak.plug.so (NODELETE)
vectors.plug.so
work-now.plug.so
xlsxreader.plug.so
zone.plug.so
myk002 commented 3 months ago

I just ran into this with preserve-rooms development. very annoying

myk002 commented 3 months ago

I'm not sure if there have been relevant changes, but I'm getting fewer NODELETE hits (50.13 / 51.01-beta21). This is the output I get from the command in the description:

$ for plug in *.plug.so; do if nm $plug | grep -q " u "; then echo "$plug (NODELETE)"; else echo $plug; fi; done
3dveins.plug.so
add-spatter.plug.so
aquifer.plug.so
autobutcher.plug.so
autochop.plug.so
autoclothing.plug.so (NODELETE)
autodump.plug.so
autofarm.plug.so
autolabor.plug.so
autonestbox.plug.so
autoslab.plug.so
blueprint.plug.so
buildingplan.plug.so (NODELETE)
burrow.plug.so
changeitem.plug.so
changelayer.plug.so
changevein.plug.so
cleanconst.plug.so
cleaners.plug.so
cleanowned.plug.so
createitem.plug.so
cursecheck.plug.so
cxxrandom.plug.so
debug.plug.so (NODELETE)
deramp.plug.so
design.plug.so
dig-now.plug.so
dig.plug.so
dwarfvet.plug.so
eventful.plug.so
fastdwarf.plug.so
faststart.plug.so
filltraffic.plug.so
fix-occupancy.plug.so
flows.plug.so
getplants.plug.so
hotkeys.plug.so
lair.plug.so
liquids.plug.so
logistics.plug.so
luasocket.plug.so
misery.plug.so
nestboxes.plug.so
orders.plug.so
overlay.plug.so
pathable.plug.so
pet-uncapper.plug.so
plant.plug.so
preserve-rooms.plug.so (NODELETE)
preserve-tombs.plug.so
probe.plug.so
prospector.plug.so
regrass.plug.so
RemoteFortressReader.plug.so
reveal.plug.so
seedwatch.plug.so
showmood.plug.so
skeleton.plug.so
sort.plug.so
spectate.plug.so
stockpiles.plug.so
stocks.plug.so
stonesense.plug.so
strangemood.plug.so
suspendmanager.plug.so
tailor.plug.so
tiletypes.plug.so
tubefill.plug.so
tweak.plug.so
work-now.plug.so
xlsxreader.plug.so
zone.plug.so
ab9rf commented 3 months ago

My guess is that you're using gcc 12, while @chdoc was using gcc 13

myk002 commented 3 months ago

It looks like I'm using:

$ gcc --version
gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614

but there have been some compiler updates recently that could have changed things

ab9rf commented 3 months ago

it's possible that the gcc team found a way to implement certain features without using static members in inline functions. it is essentially always possible to do this, after all

chdoc commented 3 months ago

I strongly doubt either explanation: I get significantly more NODELETE hits, basically the same as my initial post, using either GCC 12.4 or 13.3.1. I have no idea where this difference might come from.

lethosor commented 3 months ago

From https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html

On systems with recent GNU assembler and C library, the C++ compiler uses the STB_GNU_UNIQUE binding to make sure that definitions of template static data members and static local variables in inline functions are unique even in the presence of RTLD_LOCAL; this is necessary to avoid problems with a library used by two different RTLD_LOCAL plugins depending on a definition in one of them and therefore disagreeing with the other one about the binding of the symbol. But this causes dlclose to be ignored for affected DSOs; if your program relies on reinitialization of a DSO via dlclose and dlopen, you can use -fno-gnu-unique.

While we do use RTLD_LOCAL as of https://github.com/DFHack/dfhack/pull/558, we don't have any plugins linking to other plugins - so I'm not sure this STB_GNU_UNIQUE behavior is something we want/need anyway.

Is the concern with using -fno-gnu-unique that e.g. a static instance object could be modified by a plugin and core would see a different copy of the object without those modifications? (I'm a bit rusty on this issue, but that's what I'm gathering.) If so, I guess we do want the STB_GNU_UNIQUE behavior.

@myk002 what symbols are getting this tag in preserve-rooms?

ab9rf commented 3 months ago

Is the concern with using -fno-gnu-unique that e.g. a static instance object could be modified by a plugin and core would see a different copy of the object without those modifications? (I'm a bit rusty on this issue, but that's what I'm gathering.) If so, I guess we do want the STB_GNU_UNIQUE behavior.

This is the concern here. From what I've seen of the code we have that hits the situation, the only one that is problematic is in debug: DFHack::DebugManager::getInstance() references a class static variable in an inline function. With -fno-gnu-unique each plugin would refer to its own instance of this static rather than to the instance that's in the core, which would lead to at best inconsistent behavior. This problem can be avoided simply by making this particular method non-inline.

myk002 commented 3 months ago

@myk002 what symbols are getting this tag in preserve-rooms?

This is the culprit, defined in Persistence.h:

        const std::string & get_str() {
            static const std::string empty;
            return isValid() ? val() : empty;
        }
myk002 commented 3 months ago

Moving that function definition to the .cpp file clears up all the NODELETE hits other than debug.plug.so for me

chdoc commented 3 months ago

I'm very much puzzled by @myk002 getting significantly fewer unique symbol than I do. Even after the change introduced by #4901 , I'm still getting the following unique symbols:

$ for plug in *.plug.so; do echo "$plug"; nm $plug | grep " u "; done
3dveins.plug.so
000000000003a002 u _ZNSt6ranges6__cust9iter_moveE
000000000003a000 u _ZNSt8__detail11__synth3wayE
000000000003a001 u _ZSt19piecewise_construct
add-spatter.plug.so
0000000000016000 u _ZSt19piecewise_construct
aquifer.plug.so
0000000000023000 u _ZSt19piecewise_construct
autobutcher.plug.so
000000000004c000 u _ZNSt8__detail11__synth3wayE
000000000004c001 u _ZSt19piecewise_construct
autochop.plug.so
0000000000055000 u _ZSt19piecewise_construct
autoclothing.plug.so
0000000000035000 u _ZSt19piecewise_construct
autodump.plug.so
0000000000012000 u _ZSt19piecewise_construct
autofarm.plug.so
0000000000048000 u _ZSt19piecewise_construct
autonestbox.plug.so
autoslab.plug.so
0000000000016000 u _ZSt19piecewise_construct
blueprint.plug.so
0000000000062000 u _ZSt19piecewise_construct
burrow.plug.so
0000000000029000 u _ZSt19piecewise_construct
changeitem.plug.so
changelayer.plug.so
changevein.plug.so
cleanconst.plug.so
cleaners.plug.so
cleanowned.plug.so
createitem.plug.so
0000000000015682 u _ZNSt6ranges5views6filterE
0000000000015001 u _ZNSt6ranges6__cust3endE
0000000000015000 u _ZNSt6ranges6__cust5beginE
0000000000015002 u _ZNSt6ranges7find_ifE
cursecheck.plug.so
cxxrandom.plug.so
0000000000027000 u _ZSt19piecewise_construct
debug.plug.so
00000000001447b0 u _ZGVZN6DFHack12DebugManager11getInstanceEvE8instance
0000000000144818 u _ZGVZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0ELb0EEclEcE5__nul
0000000000144828 u _ZGVZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0ELb1EEclEcE5__nul
0000000000144838 u _ZGVZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb1ELb0EEclEcE5__nul
0000000000144848 u _ZGVZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb1ELb1EEclEcE5__nul
000000000010b000 u _ZNSt8__detail11__synth3wayE
000000000010b001 u _ZSt19piecewise_construct
0000000000144720 u _ZZN6DFHack12DebugManager11getInstanceEvE8instance
000000000013dd60 u _ZZNKSt7__cxx1112regex_traitsIcE16lookup_classnameIPKcEENS1_10_RegexMaskET_S6_bE12__classnames
0000000000144100 u _ZZNKSt7__cxx1112regex_traitsIcE18lookup_collatenameIPKcEENS_12basic_stringIcSt11char_traitsIcESaIcEEET_SA_E14__collatenames
0000000000144810 u _ZZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0ELb0EEclEcE5__nul
0000000000144820 u _ZZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb0ELb1EEclEcE5__nul
0000000000144830 u _ZZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb1ELb0EEclEcE5__nul
0000000000144840 u _ZZNKSt8__detail11_AnyMatcherINSt7__cxx1112regex_traitsIcEELb0ELb1ELb1EEclEcE5__nul
000000000010c0f0 u _ZZNKSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb0EE10_M_is_wordEcE3__s
000000000010c0f2 u _ZZNKSt8__detail9_ExecutorIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS5_9sub_matchISB_EEENS5_12regex_traitsIcEELb1EE10_M_is_wordEcE3__s
000000000010c0f4 u _ZZNKSt8__detail9_ExecutorIPKcSaINSt7__cxx119sub_matchIS2_EEENS3_12regex_traitsIcEELb0EE10_M_is_wordEcE3__s
000000000010c0f6 u _ZZNKSt8__detail9_ExecutorIPKcSaINSt7__cxx119sub_matchIS2_EEENS3_12regex_traitsIcEELb1EE10_M_is_wordEcE3__s
000000000010c010 u _ZZNSt19_Sp_make_shared_tag5_S_tiEvE5__tag
deramp.plug.so
design.plug.so
000000000001a000 u _ZNSt8__detail11__synth3wayE
000000000001a001 u _ZSt19piecewise_construct
dig-now.plug.so
0000000000041000 u _ZNSt8__detail11__synth3wayE
0000000000041001 u _ZSt19piecewise_construct
dig.plug.so
0000000000048000 u _ZSt19piecewise_construct
dwarfvet.plug.so
eventful.plug.so
0000000000028000 u _ZSt19piecewise_construct
fastdwarf.plug.so
faststart.plug.so
filltraffic.plug.so
fix-occupancy.plug.so
0000000000022000 u _ZNSt8__detail11__synth3wayE
0000000000022001 u _ZSt19piecewise_construct
flows.plug.so
getplants.plug.so
hotkeys.plug.so
0000000000023000 u _ZSt19piecewise_construct
infiniteSky.plug.so
lair.plug.so
liquids.plug.so
logistics.plug.so
000000000008f000 u _ZSt19piecewise_construct
luasocket.plug.so
0000000000018000 u _ZSt19piecewise_construct
misery.plug.so
nestboxes.plug.so
orders.plug.so
0000000000078000 u _ZSt19piecewise_construct
overlay.plug.so
0000000000070000 u _ZSt19piecewise_construct
pathable.plug.so
pet-uncapper.plug.so
000000000001e000 u _ZSt19piecewise_construct
plant.plug.so
0000000000022000 u _ZNSt8__detail11__synth3wayE
preserve-rooms.plug.so
000000000007c000 u _ZSt19piecewise_construct
preserve-tombs.plug.so
probe.plug.so
prospector.plug.so
0000000000031000 u _ZNSt8__detail11__synth3wayE
0000000000031001 u _ZSt19piecewise_construct
regrass.plug.so
reveal.plug.so
00000000000245d4 u _ZNSt6ranges5views6filterE
0000000000024001 u _ZNSt6ranges6__cust3endE
0000000000024000 u _ZNSt6ranges6__cust5beginE
0000000000024002 u _ZNSt6ranges7find_ifE
seedwatch.plug.so
0000000000043000 u _ZSt19piecewise_construct
showmood.plug.so
sort.plug.so
0000000000022000 u _ZSt19piecewise_construct
stocks.plug.so
strangemood.plug.so
suspendmanager.plug.so
tailor.plug.so
0000000000057a03 u _ZNSt6ranges5views6filterE
0000000000057003 u _ZNSt6ranges6__cust3endE
0000000000057002 u _ZNSt6ranges6__cust5beginE
0000000000057004 u _ZNSt6ranges7find_ifE
0000000000057000 u _ZNSt8__detail11__synth3wayE
0000000000057001 u _ZSt19piecewise_construct
tiletypes.plug.so
000000000003c000 u _ZNSt8__detail11__synth3wayE
tubefill.plug.so
work-now.plug.so
xlsxreader.plug.so
zone.plug.so

This is using

$ gcc --version
gcc (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614

At least, this is the compiler that I have selected in VSCode. :thinking:

ab9rf commented 3 months ago

it may be that it's the version of libstdc++ that matters, not the version of the compiler. these aren't strictly coupled with gcc, as i recall

piecewise_construct is a component of libstdc++'s implementation of std::tuple, if i recall correctly

chdoc commented 3 months ago

I may have found a partial explanation: I was still using a debug build. When I changed the build mode to release, the differences outside of debug.plug.so boil down to my compiler making use of _ZSt19piecewise_construct.

Why this one remains, I have no idea though.

it may be that it's the version of libstdc++ that matters, not the version of the compiler. these aren't strictly coupled with gcc, as i recall

I think they are. At least Gentoo does not seem to ship the library as a separate package.

lethosor commented 3 months ago

Debian/Ubuntu have GCC and libstdc++ split. GCC depends (indirectly) on libstdc++, but different gcc-# major version packages depend on the same libstdc++ package.

It is interesting that both of you are seeing different results with what appear to be the same compiler.

At least, this is the compiler that I have selected in VSCode. 🤔

Does cmake report that it's using that compiler?

I wonder if this could be because you have a debug copy of libstdc++ or something like that.