DFHack / dfhack

Memory hacking library for Dwarf Fortress and a set of tools that use it
Other
1.86k stars 470 forks source link

Lag diagnostic feature? #1151

Open rrauenza opened 7 years ago

rrauenza commented 7 years ago

We're having periodic lags in our fortress, and it reminds me of a case where autochop went crazy and created too many jobs. Then autounsuspend was waking up too often, walking the huge job list, and dragging the FPS way down: https://github.com/DFHack/dfhack/issues/1076

To diagnose the current periodic hang (like 10-20 secs), I've tried disabling some dfhack plugins and/or adding text when they start or stop. We haven't narrowed it down yet.

What would be more convenient, though, is if there was a way to hook into dfhack at a higher level such that any periodic jobs/processes emit what plugin is running and its elapsed time (and cpu time?!).

Is there a central place to do this in the dfhack code? What is the model for running external lua and ruby? It seems like there might be a mutex somewhere that gets locked before something is run ... that might be a good spot to add performance reporting.

With some pointers I might be able to figure it out and make a pull request.

Thanks!

lethosor commented 7 years ago

Have you tried disabling everything to make sure it's a DFHack thing? (unload -all to be sure.) Sometimes in large worlds, DF itself can be the cause of behavior like that.

For plugins, try timing individual plugins in PluginManager::OnUpdate() in PluginManager.cpp. For scripts, getting names of timers might be harder, but check out run_timers() in LuaTools.cpp. (I'm not sure about Ruby, but it should be in ruby/ruby.cpp in any case.)

rrauenza commented 7 years ago

Oohhh.. I was afraid it was in the .cpp. What's required to develop the c++ components? Does it require a microsoft dev environment?

We shall try unload -all first.

lethosor commented 7 years ago

You could potentially use some Lua trickery to profile Lua timers, but that wouldn't help if it's not a Lua issue. I can see if it's possible if you like.

The requirements for compiling DFHack are documented in Compile.rst.

rrauenza commented 7 years ago

We unloaded all the plugins and we still seem to get a 20-30 second lag every... well we need to measure it, but too frequently. FPS is around 17. 260 citizens, default embark size.

I also installed the Community Visual Studio 2017 and tried to run a profile .... which isn't that useful without symbols, but all the time during the hang was in the dwarffortress.exe, not in any dlls -- which I assume dfhack mostly exists in?

So... yeah, I don't think its dfhack.

I still wonder, though, if a command might be useful that dumps the cumulative cpu/wall clock time of each plugin. Maybe I'll try getting this compiled on Linux, which seems a bit simpler, and see if I can get something working --- but don't let that stop anyone else from also giving it a go :)

lethosor commented 7 years ago

Nearly all DFHack code should be running in SDL.dll (on Windows) or plugin DLLs. It's worth noting that DF calls out to that, so I don't know for sure how your profile would handle that.

A profiling command could be useful, although there would need to be a way to enable/disable the data collection too.

You could also try disabling DFHack temporarily (by renaming SDL.dll to SDL-something.dll and SDLreal.dll to SDL.dll) if you really want to rule out DFHack.