CitiesSkylinesMods / TMPE

Cities: Skylines Traffic Manager: President Edition
https://steamcommunity.com/sharedfiles/filedetails/?id=1637663252
MIT License
571 stars 85 forks source link

Improve code readability by simplifying debug logging #349

Closed dymanoid closed 5 years ago

dymanoid commented 5 years ago

In the current code, there are a lot of such constructs:

#if DEBUG
    if (debug)
        Log._Debug("Whatever");
#endif

They are sometimes repeated very often, after each line (e.g. JunctionRestrictionsManager.cs). It's almost impossible to read and to understand code like that.

Suggested improvements:

After those improvements, the logging code becomes as simple as that:

        Log._Debug("Whatever");

It will make the code much more readable and won't affect performance.

originalfoo commented 5 years ago

And in cases where conditional declarations are required, for a single line, it's much cleaner to do:

[Conditional("DEBUG")]

I'll have a go at tidying up JunctionRestrictionsManager.cs

EDIT: But first need to work out what this means:

bool debug = GlobalConfig.Instance.Debug.Switches[11];
originalfoo commented 5 years ago

Ok, seems the switches toggle individual bits of the debug trace:

https://github.com/krzychu124/Cities-Skylines-Traffic-Manager-President-Edition/blob/master/TLM/TLM/State/ConfigData/Debug.cs#L23

Here's what I propose - add a switchId overload for the _Debug() method:

[Conditional("DEBUG")]
public static void _Debug(string s) {
        LogToFile(s, LogLevel.Debug);
}

[Conditional("DEBUG")]
public static void _Debug(uint switchId, string s) {
    if (GlobalConfig.Instance.Debug.Switches[switchId])
        LogToFile(s, LogLevel.Debug);
}

Using the overloaded method:

[Conditional("DEBUG")]
Log._Debug(11, "Whatever"); // eg. only debug if switch 11 is set

Even then the 11 is a magic number until you go digging. Why not just define a load of named boolean constants or something?

originalfoo commented 5 years ago

Ugh, Log.cs in CSUtil.Commons module so no access to TrafficManager.State...

VictorPhilipp commented 5 years ago

The logic of the debug flag often includes other debug parameters like SegmentId, NodeId, VehicleId, etc.

Suggestion: Don't use magic numbers. Create a TMLog class in the TLM project with individual Debug... methods for each debug condition. Make TMLog wrap Log.

Most important: Please wait until the harmony migration is complete since almost every change to the manager classes is causing merge conflicts.

dymanoid commented 5 years ago

@aubergine10: JunctionRestrictionsManager.cs was just an example, those #ifdef constructs are scattered across the whole solution. It would be really nice to improve all the usages, not only that class.

And you don't need this:

[Conditional("DEBUG")]
Log._Debug(11, "Whatever");

You need only this:

Log._Debug(11, "Whatever");

The Log._Debug method already has the [Conditional("DEBUG")] attribute. So it will only be called if DEBUG is defined. No need of additional stuff. Please read about that attribute in the docs.

@VictorPhilipp, consolidating the logging is a great idea!

originalfoo commented 5 years ago

Assigned @kvakvs who's working on cleaning up the debug stuff.

originalfoo commented 5 years ago

So...

@kvakvs has been churning through the code in #430 to update debug logging amongst other things, and some issues have arisen...

Log._DebugIf( someLogSwitch, "someStr" );

This works great if "someStr" is just a nice simple string.

However, if string formatting is involved - it's pretty much ubiquitous in TM:PE ...

Log._DebugIf( someLogSwitch, $"some {complex()} {thing}" );

The formatting occurs even if someLogSwitch is false. During in-game testing, it turned out all that extra string formatting was causing massive lag. While performance isn't paramount during debugging, you still want at least some responsiveness to avoid making debugging an unwanted chore.

The first solution was to put the formatted string in a lambda function, as this allows an overloaded Log._DebugIf() method to first check someLogSwitch and only if that is true does it invoke the lambda function at which point the string formatting takes place:

Log._DebugIf( someLogSwitch, () => $"some {complex()} {thing}" );

@krzychu124 checked the IL and confirmed it's very little overhead to do that.

However, lambda functions don't capture in, out and ref params at the calling site. Such params are common in TM:PE (and will get more common considering we want to use more in and ref for performance reasons) and regularly feature in debug logging.

So we end up with stuff like this:

#if DEBUG
    var copyOfSomething = someRefParam;
    Log._DebugIf( someLogSwitch, () => $"bleh {copyOfSomething}" );
#endif 

That's adding extra memalloc and gc which isn't great.

Another alternative, assuming an if statement will be culled if its body is empty due to Log._Debug() being culled in RELEASE builds (I haven't checked), could be:

if ( someLogSwitch ) {
    Log._Debug( $"{someRefParam}" );
}

That avoids unnecessary string formats and also means we don't need to worry about uncapturable refs in lambdas, but if we use that pattern then why even both with the ._DebugIf() stuff? Also, if the if doesn't get culled when its body is empty, we'd be back to using #if ... #endif blocks. Ugh.

Another alternative is to make Log._Debug() ignore empty strings:

public static const string Ignore = string.Empty;

[Conditional("DEBUG")]
public static void _Debug(string s) {
    if (s != Ignore) {
        LogToFile(s, LogLevel.Debug);
    }
}

Then we use a conditional in the debug statements, like so:

Log._Debug( someLogSwitch ? ${whatever} : Log.Ignore );

No idea what it would be like in terms of performance. Also, @kvakvs mentioned conditionals get in the way of keyboard navigation.

What is the best option? It would be nice if we could settle on a pattern that works for the majority of cases. Ideally avoiding ugly #if .. #endif blocks. For performance I'm guessing the pattern with the if statement is best, but if it doesn't auto-cull when the body is empty that's a dead end.

kvakvs commented 5 years ago

By "getting in the way of keyboard navigation" I mean having a condition ? at the start of the log string, then followed by the format string, possibly multiline, and having to append a : "", in every log string. While the idea with ternary operator is sound and reasonable, still that is a lot of keyboard motions. I will probably do that instead of if (cond) Log.Debug

@aubergine10 Even if I am forced to use outer if or that ternary check in some locations, still in 150 more locations Log._DebugIf is usable (compared to ~900 locations with other forms of logging), so that's better.

dymanoid commented 5 years ago

Making a simple copy of a ref won't involve GC at all. For trivial types like int and for reference types like string or any other class there will be no performance loss at all and only negligible stack memory usage. For custom structs like Vehicle, a copy will be allocated on the stack. Heap memory traffic and GC - no, performance penalty - yes, due to moving large memory blocks.

And please, please, please, don't ever use in. Really, never! You'll ruin your performance. It's the most misunderstood feature of C#. Better not to use it without deep understanding what it does and when to use it.

kvakvs commented 5 years ago

One more note, i am suspecting that logging multiple string interpolations is going to create a very slow concatenating code. Might consider using String.Format or something.

                    $"ProcessItemPedBicycle called.\n" +
                    $"\tprevMaxSpeed={prevMaxSpeed}\n" +
                    $"\tprevLaneSpeed={prevLaneSpeed}\n" +
                    $"\tnextSegmentId={nextSegmentId}\n" +
                    $"\tnextNodeId={nextNodeId}\n" +
                    $"\tnextLaneIndex={nextLaneIndex}\n" +
                    $"\tnextLaneId={nextLaneId}\n" +
                    $"\tconnectOffset={connectOffset}\n" +
                    $"\tlaneSwitchOffset={laneSwitchOffset}"
originalfoo commented 5 years ago

I thought in was like a read-only ref?

krzychu124 commented 5 years ago

Yes it is (almost), but you have pass correct type, and create variable before passing that variable as method parameter 😉

originalfoo commented 5 years ago

For performance, would it be better to use ref albeit at the risk of a method changing something it shouldn't?

dymanoid commented 5 years ago

Rule of thumb: don't use in. If you still want to use it, see rule of thumb. Please read this SO question and the two most upvoted answers.

string.Format is even slower than string concatenation. Better to avoid both whenever possible.

DaEgi01 commented 5 years ago

@dymanoid I don't get why you advice against using "in". yeah, you have to keep in mind that its probably not a good idea to pass a mutable structs via in, because you kinda loose the perf benefit that you were probably after in the first place. and that the defensive copy is done implicitly without any warning kinda sucks I know. I hope there are or will be some VS extensions that can give you a warning in that case. you can pass readonly structs via "in" without any issues though. yeah, keeping that stuff in mind sucks, because we all make errors. but then, if you use structs you suppose to know the difference between value types and reference types anyway, since you can shoot yourself in the foot with structs too. and some ppl argue that its a bad idea to have mutable structs in the first place. so all in all IMO "in" is okayish, and as always you better know your shit if you use it. in the end you should profile anyway.

regarding string interpolation ($"bla {blup}") which is just syntactic sugar for string.Format... yeah, don't use it in your hotpath or tight loops. everywhere else it should be fine though. from a perf view priority should be StringBuilder >= String.Concat == string + string >> String.Format == string interpolation

kvakvs commented 5 years ago

Logs formatting performance is not important until you enable some of the logs, like pathfinding or AI logging via the Debug section in the XML config. But when you enable it, oh boy is it slow... Those interpolations +'ed together, that is THE problem. I am considering either one-line string interpolations where no concatenation happens, or verbatim interpolations with \n replaced with real new lines, or forming lists of args for stringbuilder.

DaEgi01 commented 5 years ago

I would bet that the effect from instantiating a new streamwriter and closing it again every time you want to log one line is much worse than the effect from string interpolation, because you waste all the buffering that is going on inside the streamwriter. my assumption is that writing to the file system without proper buffering even on an SSD should affect perf much more than a few strings that are GCd. its not mutually exclusive. you can have both. less strings to GC AND the buffering. instantiate the streamwriter once and keep it around. use stringbuilder to produce the longer strings.

kvakvs commented 5 years ago

I am considering a logging thread and sending (possibly nested) lists of format strings and args to it, so it will do the faster formatting and logging. This is something i'd not like to do in this ticket though, neither we really need it at this moment. Normally when you have everything disabled, logging activity is very minor. At this point this is about making logging look compact and nice to read, as most of the time it is disabled anyway.

dymanoid commented 5 years ago

@DaEgi01, I'm arguing against the in keyword because the only meaningful and not-performance-ruining usage of it is to pass readonly structs larger than 20 bytes. How many of them will you have in a real software, even high-perf? Most probably - none. And if you will, then this will be such an exotic edge case, that you won't need any suggestions from dymanoid or anyone else, because you're an extremely experienced high-perf dev in that case. The problem with the defensive copies is that they will be created on every method call of the struct, on every property access, even within the struct itself for the this reference, not on passing the struct as an argument. Can you imagine a performance disaster caused by this?

@kvakvs, take a look at how I implemented logging in SkyTools for Real Time. There is a buffering queue of messages and a dedicated thread that manages the queue processing and disk IO.

DaEgi01 commented 5 years ago

@dymanoid I don't understand why you don't see value in 20 bytes readonly structs and also I wonder why 20 bytes and not 4 bytes (x86) / 8 bytes (x64)?

And please, please, please, don't ever use in. Really, never! You'll ruin your performance. It's the most misunderstood feature of C#. Better not to use it without deep understanding what it does and when to use it.

but ok, after rereading your post I see now that your "never" is not a literal never but a "not without deep understanding" which I totally agree with.

dymanoid commented 5 years ago

Yes, by "never" I mean not literally "not a single time in your life time" but rather "you can really go without it, and if not, then you surely know what you are doing".

20 bytes for x86 and 24 bytes for x64 are the struct sizes where the jitter decides to switch from simple and fast CPU instructions (cheap copying) to a loop (expensive copying). Cheap copying is almost as fast as accessing by-reference.

originalfoo commented 5 years ago

Speaking as a newbie, I had no idea about the issues with in until Dymanoid pointed me at that SO page.

I would certainly like to try and get some Rosyln (or whatever) warning if in is used. Such warnings can always be disabled for individual uses should we encounter that rare fringe case of readonly struct that's > 2{0, 4} bytes.

Regarding buffering a queue for logging, I think Victor was doing something along those lines in #370 - maybe the SkyTools approach would work well there too?

kvakvs commented 5 years ago

@aubergine10 i did not experience TM:PE with debug logging enabled and i can't imagine how bad can it be. Surely that can be done, just maybe in a separate PR. More logs is usually less useful due to overwhelming noise to signal ratio.