fsprojects / FSharp.Formatting

F# tools for generating documentation (Markdown processor and F# code formatter)
https://fsprojects.github.io/FSharp.Formatting/
Other
462 stars 155 forks source link

Minor code clean-ups and performance improvements #906

Closed Thorium closed 3 months ago

Thorium commented 3 months ago

Code clean-up with minor performance optimizations:

No functional changes.

Thorium commented 3 months ago

FSharpLint can detect the null-check, but it's a problematic project because it seems to break on each new major .NET release (which are often) and it randomly crashes on many other cases. The others were manual.

Thorium commented 3 months ago

General guidelines about structs (and I don't have good references to point here):

Keep your structs small.

By small I mean a bool or guid or int or int64 or another struct, but not bunch of other classes or interface-implementations of any kind or big lists or anything like that.

Struct is read to memory at once and kept there, so it's not a pointer to somewhere else, and it makes the processing fast and footprint small. The limitation is function parameters, structs are copied and not sent as pointers, which means if you have big structs, then function calls go slow. Which you can avoid by making your structs as ByRef and so on, but then they are mutable and it goes as hack-hack rather than proper enterprise code. But that's why e.g. System.Text.Json is so fast, they use big structs byref and trust to take 100% out of your single core.

If you have a simple object having a few non-class fields, then it's better to be struct. If you have a 1000s of these small objects it will really improve the performance and memory footprint when processing them through.

Option class is not a struct, but there is a struct-corresponding ValueOption. ValueOption has IsSome and IsNone and typically the usage with match-clause will just work. And there is ValueOption.bind and so-on like Option.bind you've used to. But if you use lot of existing F# functions like List.tryPick they work with Some x and not ValueSome x, and there I'd go for usability over performance.

Tuple class is not a struct but, but there is a ValueTuple which is used instead of (a,b) like this: struct(a,b). That's a bit inconvenient to use.

So this is still good:

[<Struct>]
type MyType =
| MyOp of int
| MyOp2 of bool ValueOption 
| MyOp3 of struct(bool*bool) 

They also say you should performance-test, to verify your changes, because many times it depends on use-case if a harmless looking [<Struct>] type MyType = MyOp of int array is better to be a struct or not.

So for conservative analysis, I'd go for only non-data-carrying and/or only plain boolean/int/guid/struct-field types to be recommended to be structs.

nojaf commented 3 months ago

Thanks for all the insight!

Last question: is any of this noticeable? We don't have a benchmark so I'm not asking you to add that but did you run this before and after? Just curious.

Thorium commented 3 months ago

The comparisons are probably microseconds, so I expect it's not human noticeable in a single run of generating the docs, but if you have build-pipeline over every commit queuing generating docs, then even small changes can affect over time, to free up some CI workers faster.

nojaf commented 3 months ago

Sure, that is fair. I just wanted you to admit that you didn't check 😉. A valuable PR nonetheless.

Thorium commented 3 months ago

Here is quite detailed reference post of Structs: https://www.bartoszsypytkowski.com/writing-high-performance-f-code/