eggheads / eggdrop

The Eggdrop IRC Bot
GNU General Public License v2.0
506 stars 84 forks source link

REHASH: duplicate chanfile saves & freezes #520

Closed ghost closed 2 years ago

ghost commented 6 years ago

REHASH causes 1 bot I run to freeze. (As this seems to be a disputed term, let me clarify now before I'm asked, yet again: it. stops. working. and. does. nothing.)

Other bots on the same box do not. Recompiling does not resolve. I have duplicated with NO SCRIPTS LOADED (raw bot).

Walkthrough: * EGGDROP v1.8.3 * TCL v8.6.8 * $::raw-logs = 1 (so additional console data can be seen; see bug report #519 )

[2018-05-20 18:17:28 -0800] [03:17:37] #Domino# console +* [2018-05-20 18:17:28 -0800] Set your console to #bots: mpjkcoblrxsdwvtuhg12345678 (msgs, public, joins, kicks/modes, cmds, misc, bots, linked bot msgs, raw, files, server, debug, wallops, server output, botnet incoming, botnet outgoing, incoming share traffic, outgoing share traffic, level 1, level 2, level 3, level 4, level 5, level 6, level 7, level 8).

I've been told to alter the config file and create log files to demonstrate the information requested by Geo. Apparently, pasting the console output is useless. [STMelon]

I won't; ".console" exists for a reason.

[2018-05-20 18:17:37 -0800] .tcl rehash [2018-05-20 18:17:37 -0800] [03:17:46] Writing user file... [2018-05-20 18:17:38 -0800] [03:17:46] Writing channel file... [2018-05-20 18:17:38 -0800] [03:17:46] Rehashing... [2018-05-20 18:17:38 -0800] [TCL (1ms)] [2018-05-20 18:17:38 -0800] [03:17:47] Writing channel file...

... and that's it. The bot STOPS there.

[2018-05-20 18:18:01 -0800] .tcl rehash \<bot does not respond>

Three questions:

(1) want to know what the code makes it do after the second "writing channel file" (so I know where to trouble-shoot, if it's within my ability) (2) why does it write the chanfile TWICE (3) why does it return from the TCL command, then continue with another chanfile write (as if it's being done as a background-triggered item) after it has returned

Could it be a similar issue to the UPDATE bug I filed years ago. ( http://cvs.eggheads.org/viewvc/eggdrop1.6/doc/Changes1.6?view=markup#l100 )

Another user has VERIFIED seeing this issue: Freenode\#Eggdrop; [2018-05-20 18:26:26 -0800] \<MrBoss> i got the same return, writing channel file twice

( This user has another bug filed: #518 ).

michaelortmann commented 6 years ago

i dont know why it writes the chanfile twice, but it probably does so by setting do_restart = -2 before ending tcl command. main.c mainloop() will pick up do_restart = -2 and call chanprog.c rehash() which probably writes the chanfile.

tcldcc.c:

static int tcl_rehash STDVAR [...] putlog(LOG_MISC, "*", USERF_REHASHING); do_restart = -2; [...] }

main.c:

int mainloop(int toplevel) { [...] if (do_restart == -2) rehash(); [...] }

chanprog.c:

void rehash() { [...] chanprog(); [...] }

chanprog.c:

void chanprog() { [...] call_hook(HOOK_REHASH); [...] }

this hook probably calls channels.c channels_rehash() which calls write_channels();

ghost commented 6 years ago

Is anyone doing anything to correct this? If not, I'll close it. No reason to have 200 open issues open.

@michaelortmann : this may be the patch required to repair another issue (which I can't find), where "update" was broken in the sense that: if you throw "update" during a rehash / restart / cold-start, the bot would re-enter the do_restart loop, but *not* at the global level (it would nest down a stack level), causing global-variable references to fail [fatal error]. Check the error log for ...let's see ...

# This explains the reason the "UPDATE" was removed from the boot-up code. # Eggdrop 1.6.21 release notes # http://cvs.eggheads.org/viewvc/eggdrop1.6/doc/Changes1.6?view=markup # 100 - do_restart is now reset before actually performing a rehash or restart to # 101 ensure it doesn't try to do it again infinitely. # 102 Found by: Domino / Patch by: thommey

I'm Domino.

michaelortmann commented 6 years ago

i cant repeat the bug on my machine. i see some loops in userrec.c:clear_userlist() maybe it hangs there or in delignore() if anyone can reproduce, can you pls check for ignores and remove them? maybe this issue is related with a string truncation in users.:delignore(): strncpyz(temp, ign, sizeof temp); (temp is sizeof 256, but i was able to set an ignore (ign) of strlen 319) if anyone can reproduce, can you pls check if if the bug goes away, when changing the sharing status of the bot? another idea about the cause of this bug: eggdrop used fopen() for userfile/chanfile/etc in combination with chmod() to set file permission. this is open to race condition. fchmod would be better and easy to implement. best solution would probably be a biger patch using open(..., ..., umask) instead.

vanosg commented 2 years ago

Closing, user is not around to confirm fix