fsprojects / Fleece

Json mapper for F#
http://fsprojects.github.io/Fleece
Apache License 2.0
199 stars 31 forks source link

Compilation time degrades x3 to x4 times when consuming project targets net7.0 or net8.0 (rather than net6.0) #146

Closed DunetsNM closed 5 months ago

DunetsNM commented 6 months ago

I have a project with a lot of complex Fleece codecs.

Codecs have never been too fast to compile (due to heavy usage of SRTP) but after I tried to upgrade <TargetFramework> value from net6.0 to net7.0 or net8.0 it got worse.

The main suspect is SRTP and inlining which is heavily used in codecs. Did something change in newer F# versions that made it so much slower to compile on average?

gusty commented 5 months ago

Hi @DunetsNM That's really odd, I don't see why by simply switching to net7/8 compile time would increase. Some questions: Do you use SRTP in codecs ? Last time I saw your codebase most codecs were explicit except a few cases. Would it be possible to create a small repro?

@vzarytovskii do you have any idea of what could slow down in net7/8 ?

vzarytovskii commented 5 months ago

Yes, static abstracts introduction (both compiler feature and them being in BCL) slows it down significantly. Especially in traits heavy code. It now needs to check so many more things if they satisfy traits.

gusty commented 5 months ago

I see, I knew about that, but I didn't know the slow down was so bad (about 3x).

@DunetsNM if codecs in your codebase don't heavily use SRTPs (implicit codecs), it could be Fleece internals. If so, I can create a branch hardcoding some internal stuff, and therefore removing static resolution in the library. Last time I tried that, the improvement was really marginal, but maybe the situation is different now.

vzarytovskii commented 5 months ago

Please create issue over at compiler repo, with build instructions, I will extract profiles for us to see what exactly changed.

gusty commented 5 months ago

Thanks @vzarytovskii

@DunetsNM can you contact me by FSSF Slack so we discuss what's the best approach to follow?

DunetsNM commented 5 months ago

thanks for quick replies guys, I got sick today, will create a smaller repro when I get better / soon.

Do you use SRTP in codecs ? Last time I saw your codebase most codecs were explicit except a few cases.

Correct, nearly everything is explicit

DunetsNM commented 5 months ago

Extra compilation time is pretty obvious even in a simple flat list of types with codecs - although only x2, perhaps it gets worse if there's a lot of nesting?

Here's a dummy example:

slower_net7.zip

For better readability contents of the files:

Types.fs

module Example.Types

[<RequireQualifiedAccess>]
type DummyUnionType1 =
| CaseA of string
| CaseB of string
| CaseC of string

[<RequireQualifiedAccess>]
type DummyUnionType2 =
| CaseA of string
| CaseB of string
| CaseC of string

[<RequireQualifiedAccess>]
type DummyUnionType3 =
| CaseA of string
| CaseB of string
| CaseC of string

...

[<RequireQualifiedAccess>]
type DummyUnionType10 =
| CaseA of string
| CaseB of string
| CaseC of string

type DummyRecordType = {
    X1: DummyUnionType1
    X2: DummyUnionType2
    X3: DummyUnionType3
    X4: DummyUnionType4
    X5: DummyUnionType5
    X6: DummyUnionType6
    X7: DummyUnionType7
    X8: DummyUnionType8
    X9: DummyUnionType9
    X10: DummyUnionType10
}

open Fleece
open System.Reflection
open Microsoft.FSharp.Reflection
open FSharpPlus.Data
open FSharpPlus

let inline mergeUnionCases (fn: 'Union -> '``Codec/Decoder<PropertyList<'Encoding>, 'Union>``) : '``Codec/Decoder<PropertyList<'Encoding>, 'Union>`` =
    FSharpType.GetUnionCases (typeof<'Union>, BindingFlags.NonPublic ||| BindingFlags.Public)
    |> NonEmptySeq.ofArray
    |> NonEmptySeq.map (fun c -> FSharpValue.MakeUnion (c, Array.zeroCreate (Array.length (c.GetFields ())), BindingFlags.NonPublic ||| BindingFlags.Public) :?> 'Union)
    |> NonEmptySeq.map fn
    |> choice

type DummyUnionType1 with
    static member get_Codec () =
        function
        | CaseA _ -> CaseA <!> jreqWith Codecs.string "CaseA" (function (CaseA x) -> Some x | _ -> None)
        | CaseB _ -> CaseB <!> jreqWith Codecs.string "CaseB" (function (CaseB x) -> Some x | _ -> None)
        | CaseC _ -> CaseC <!> jreqWith Codecs.string "CaseC" (function (CaseC x) -> Some x | _ -> None)
        |> mergeUnionCases
        |> ofObjCodec

type DummyUnionType2 with
    static member get_Codec () =
        function
        | CaseA _ -> CaseA <!> jreqWith Codecs.string "CaseA" (function (CaseA x) -> Some x | _ -> None)
        | CaseB _ -> CaseB <!> jreqWith Codecs.string "CaseB" (function (CaseB x) -> Some x | _ -> None)
        | CaseC _ -> CaseC <!> jreqWith Codecs.string "CaseC" (function (CaseC x) -> Some x | _ -> None)
        |> mergeUnionCases
        |> ofObjCodec

...

type DummyUnionType10 with
    static member get_Codec () =
        function
        | CaseA _ -> CaseA <!> jreqWith Codecs.string "CaseA" (function (CaseA x) -> Some x | _ -> None)
        | CaseB _ -> CaseB <!> jreqWith Codecs.string "CaseB" (function (CaseB x) -> Some x | _ -> None)
        | CaseC _ -> CaseC <!> jreqWith Codecs.string "CaseC" (function (CaseC x) -> Some x | _ -> None)
        |> mergeUnionCases
        |> ofObjCodec

type DummyRecordType with
    static member get_Codec () = 
        codec {
            let! x1 = jreqWith (DummyUnionType1.get_Codec()) "X1" (fun x -> Some x.X1)
            and! x2 = jreqWith (DummyUnionType2.get_Codec()) "X2" (fun x -> Some x.X2)
            and! x3 = jreqWith (DummyUnionType3.get_Codec()) "X3" (fun x -> Some x.X3)
            and! x4 = jreqWith (DummyUnionType4.get_Codec()) "X4" (fun x -> Some x.X4)
            and! x5 = jreqWith (DummyUnionType5.get_Codec()) "X5" (fun x -> Some x.X5)
            and! x6 = jreqWith (DummyUnionType6.get_Codec()) "X6" (fun x -> Some x.X6)
            and! x7 = jreqWith (DummyUnionType7.get_Codec()) "X7" (fun x -> Some x.X7)
            and! x8 = jreqWith (DummyUnionType8.get_Codec()) "X8" (fun x -> Some x.X8)
            and! x9 = jreqWith (DummyUnionType9.get_Codec()) "X9" (fun x -> Some x.X9)
            and! x10 = jreqWith (DummyUnionType10.get_Codec()) "X10" (fun x -> Some x.X10)
            return {
                X1 = x1
                X2 = x2
                X3 = x3
                X4 = x4
                X5 = x5
                X6 = x6
                X7 = x7
                X8 = x8
                X9 = x9
                X10 = x10
            }
        }
        |> ofObjCodec

Net6Client.fsproj

<Project Sdk="Microsoft.NET.Sdk">
    <PropertyGroup>
        <TargetFramework>net6.0</TargetFramework>
    </PropertyGroup>
    <ItemGroup>
        <PackageReference Include="Fleece.SystemTextJson" Version="0.10.0" />
    </ItemGroup>
    <ItemGroup>
        <Compile Include="Types.fs" />
    </ItemGroup>
</Project>

Net7Cilent.fsproj (one line diff)

<Project Sdk="Microsoft.NET.Sdk">
    <PropertyGroup>
        <TargetFramework>net7.0</TargetFramework>
    </PropertyGroup>
    <ItemGroup>
        <PackageReference Include="Fleece.SystemTextJson" Version="0.10.0" />
    </ItemGroup>
    <ItemGroup>
        <Compile Include="Types.fs" />
    </ItemGroup>
</Project>

global.json (SDK 7.0.300 performs similar)

{
  "sdk": {
    "version": "8.0.300",
    "rollForward": "latestMinor"
  }
}

If you run

dotnet build Net6Client.fsproj dotnet build Net7Client.fsproj

it takes ~3.5 sec for net6 vs ~7 sec for net7 on my machine, YMMV This example uses only explicit codecs passing, but implicit codecs didn't make any difference here (perhaps because of too simple / flat hierarchy).

DunetsNM commented 5 months ago

Also if you try to target Fleece.fsproj itself to net7.0 you will see proportional slowdown in compilation time.

So, if any SRTP-heavy code is affected, is there anything that can be done about it / apart from pinning to net6 ? If, in theory, we make a fork of Fleece that uses IWSAM as a main vehicle for compile-time codec resolution, will it make the library and client code any faster to compile?

I mean something like this:

type IHaveDefaultCodec<'ImplType
    when 'ImplType :> IHaveDefaultCodec<'ImplType>> =
    static abstract member Codec<'Encoding when 'Encoding :> Fleece.IEncoding and 'Encoding : (new: unit -> 'Encoding)> : unit -> Fleece.Codec<'Encoding, 'ImplType>

type OrderDirection =
| Ascending
| Descending
with
    static member get_Codec () =
        function
        | Ascending  -> konst Ascending <!> jreqWith Codecs.unit "Ascending" (function Ascending -> Some () | _ -> None)
        | Descending -> konst Descending <!> jreqWith Codecs.unit "Descending" (function Descending -> Some () | _ -> None)
        |> mergeUnionCases
        |> ofObjCodec

    interface IHaveDefaultCodec<OrderDirection> with
         static member Codec() = OrderDirection.get_Codec ()

I guess it depends on where the bottleneck is. Is it emitting of the inlined code (which can't be avoided in case of SRTP , but not required for IWSAM), or computationally-heavy type resolution.

gusty commented 5 months ago

@DunetsNM hope you feel better today. Thanks for the repro, I will have a look in order to find out where's the bottleneck on the code side.

Regarding IWSAMs, there's already #132 have you tried it? And there's even a nuget for it.

gusty commented 5 months ago

I see now, in your last code snippet you mean using IWSAMs for implicit codecs, one important limitation with that approach is retrofitting primitive types. That's not possible with IWSAMs unless you control the source code.

DunetsNM commented 5 months ago

@gusty yes I've played with #132 before (didn't switch because our code still requires AdHocEncoding), just double check that compilation speed difference is about the same. But it uses IWSAM only for IEncoding which is relatively small part of Fleece.

Also, you don't really need my examples, the Fleece library itself is affected:

FleeceNet6.fsproj

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFrameworks>net6.0</TargetFrameworks>
  </PropertyGroup>
  <ItemGroup>
    <Compile Include="Fleece.fs" />
  </ItemGroup>
  <ItemGroup>
    <PackageReference Include="FSharpPlus" Version="1.2.4" />
  </ItemGroup>
</Project>

FleeceNet7.fsproj

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <TargetFrameworks>net7.0</TargetFrameworks>
  </PropertyGroup>
  <ItemGroup>
    <Compile Include="Fleece.fs" />
  </ItemGroup>
  <ItemGroup>
    <PackageReference Include="FSharpPlus" Version="1.2.4" />
  </ItemGroup>
</Project>

Here dotnet build FleeceNet6.fsproj and dotnet build Fleece.Net7.fsproj take ~11 sec and ~25 sec respectively for me.

Let me try the same with FSharpPlus which Fleece uses a lot ...

DunetsNM commented 5 months ago

So for current FSharpPlus master build time

net6.0 1:46 net7.0 2:43 net8.0: 2:47

DunetsNM commented 5 months ago

@vzarytovskii I had no luck reproducing slowdown of SRTP / inline with dummy vanilla code, it's only Fleece and/or FSharpPlus projects where slowdown in build time is obvious. Dissecting it further to find the bottleneck is not an obvious task to me. Should that still be reported to F# compiler repo, or it rather belongs to those projects (as they take usage of SRTP to its own extremes / not your typical F# compiler usage) ?

Just in case here's my dumb inline-heavy SRTP code that compiles in about the same time for net6, net7 and net8 (~7 seconds in all cases)

SrtpJunk.zip

Contents of the source file:

module Example

type Record =
    { Number: int }
    static member Zero() = { Number = 0 }

let inline fmap1 (f: ^a -> ^b) (a: ^a list) =
    printfn "fmap1"
    List.map f a

let inline fmap2 f a =
    printfn "fmap2"
    fmap1 f a

let inline fmap3 f a =
    printfn "fmap3"
    fmap2 f a

...

let inline fmap99 f a =
    printfn "fmap99"
    fmap98 f a

let inline fmap100 f a =
    printfn "fmap100"
    fmap99 f a

let main () =
    fmap1 (fun x -> System.Int32.Parse(x)) ["1";"2";"3";"4"] |> ignore
    fmap1 (fun x -> System.Int32.Parse(x)) ["1";"2";"3";"4"] |> ignore
    fmap2 (fun x -> x.ToString()) [1;2;3;4] |> ignore
    fmap2 (fun x -> System.Int32.Parse(x)) ["1";"2";"3";"4"] |> ignore
...
    fmap100 (fun x -> x.ToString()) [1;2;3;4] |> ignore
    fmap100 (fun x -> System.Int32.Parse(x)) ["1";"2";"3";"4"] |> ignore
    0
vzarytovskii commented 5 months ago

Sure, still report it, but with project examples.

gusty commented 5 months ago

@DunetsNM thanks for all the repros. On my side I can confirm that definitely heavy use of SRTPs becomes almost 2x slower or more as from .net7

We can help @vzarytovskii to investigate and hopefully fix whatever is causing that bottleneck in the compiler but as I pointed at the beginning I am a bit surprised since I know your codebase and AFAIK it doesn't rely that much on heavy SRTP.

However after seeing your repro I found that there is a small detail that causes an unnecessary level of genericity, but good news, it's fixable.

The problem is this line:

open FSharpPlus

Well, not that line alone, but the issue is that codecs use the <!> operator, but F#+ does an autoopen of generic operatos when you open the namespace FSharpPlus.

This is something we're planning to change in v2 of the library, to avoid precisely situations like this. It is still not clear what would be the new namespace design (suggestions are welcome).

So, it's a bit tricky right now, since the code in the end routes to the same implementation but it takes longer to compile, so you don't notice that something breaks or so. But making sure FSharpPlus is not open at the codec definition should fix the long compilation issue.

If just a few functions are needed from FSharpPlus you can simply fully qualify it, for instance in your repro you can do:

let inline mergeUnionCases (fn: 'Union -> '``Codec/Decoder<PropertyList<'Encoding>, 'Union>``) : '``Codec/Decoder<PropertyList<'Encoding>, 'Union>`` =
    FSharpType.GetUnionCases (typeof<'Union>, BindingFlags.NonPublic ||| BindingFlags.Public)
    |> NonEmptySeq.ofArray
    |> NonEmptySeq.map (fun c -> FSharpValue.MakeUnion (c, Array.zeroCreate (Array.length (c.GetFields ())), BindingFlags.NonPublic ||| BindingFlags.Public) :?> 'Union)
    |> NonEmptySeq.map fn
    |> FSharpPlus.Operators.choice

... or create a small namespace in your project aliasing these functions.

Another alternative could be doing exactly the opposite, I mean, keep opening FSharpPlus but open later another namespace which shadows the <!> operator.

As said, we'll fix this in v2 of F#+ as it will be focusing more on less generic code by default. Most likely you'll have to explicitly open a "generic operators" namespace.

vzarytovskii commented 5 months ago

The fact that the change to target framework alone changes it, leads me to believe that change in the interface hierarchy causes it

gusty commented 5 months ago

@vzarytovskii I'll try to isolate the compiler issue so you have a smaller repro, but it doesn't seem to be easy. Still I'll give it a try.

gusty commented 5 months ago

@DunetsNM I had a typo in my previous response, I meant the <!> operator. The operator <|> could also be affected but it's not in your repro code. I edited my answer.

abelbraaksma commented 5 months ago

leads me to believe that change in the interface hierarchy causes it

@dsyme just made this commit to the IWSAM RFC (https://github.com/fsharp/fslang-design/commit/71cd2e04bc181264a6a68c731ff018369848d1f9).

For the specific case of IWSAMs, the presence of enormous generic interface lists can cause compiler slow-down. No one ever expected such lists of interfaces in .NET, this is a volcano of hidden complexity lying under the simple type double or decimal. [3/6/2024: as prophesized, an example of how this feature led to this in real life can be seen here]

This suggests that @vzarytovskii's comment is probably spot on indeed.

dsyme commented 5 months ago

This suggests that @vzarytovskii's comment is probably spot on indeed.

My comment was based on Vlad's analysis, and doesn't add new information.

abelbraaksma commented 5 months ago

Oh, oops 😆

gusty commented 5 months ago

@DunetsNM Could you confirm whether my suggestion fixed the issue or not?

DunetsNM commented 5 months ago

@gusty somehow missed that, will try shortly. I suppose you mean these suggestions?

making sure FSharpPlus is not open at the codec definition should fix the long compilation issue.

and

Another alternative could be doing exactly the opposite, I mean, keep opening FSharpPlus but open later another namespace which shadows the <!> operator.

DunetsNM commented 5 months ago

@gusty it helped !

The only change I had to make in my slower_net7.zip example was to remove open FSharpPlus and qualify choice to Fleece.Operators.choice

Now both net6 and net7 take ~3.5 sec to compile.

And in my prod project the problem was literally just in one open FSharpPlus which was used only for three occurrences of tryParse . Compilation time dropped instantly from ~2:30 to just ~00:30.

What else is interesting, this open FSharpPlus occurred very early in the compilation order in the module with some reusable types. If I declare a similar type in the module which comes last in compilation order then compilation speed deteriorates "only" to 1:40 seconds (still very bad). Scratch that, I think it was just bigger number of <!> and <* usages in first file over the other one (see next comment).

I'm not sure if we can single out FSharpPlus as the main culprit just yet. My project still has plenty of open FSharpPlus in other source files that don't use codecs and those still compile fast. Somehow it's a combination of both Fleece usages and open FSharpPlus in the same module. I need to try to to make as tiny module as possible that still compiles ridiculously slow.

It's also interesting whether compiler can help to catch such problem early and emit warning if some symbol resolution takes a very very long time.

DunetsNM commented 5 months ago

Well, not that line alone, but the issue is that codecs use the <!> operator, but F#+ does an autoopen of generic operatos when you open the namespace FSharpPlus.

This is something we're planning to https://github.com/fsprojects/FSharpPlus/issues/288 of the library, to avoid precisely situations like this. It is still not clear what would be the new namespace design (suggestions are welcome).

Ok besides the <!> operator my code also uses <* operator, they are both present both in FSharpPlus and in Fleece. Both are responsible. To bring back compilation speed to normal it's enough to define unambiguous operator such as let (<!!!>) f field = Fleece.Codec.map f field and use it instead of <!> (I was unable to quickly provide uniquely named alternative for Fleece's <* so just commented out those few usages), I can still keep open FSharpPlus

So it's definitely the naming clash between the two libraries, I can only guess what heavy lifting compiler performs while deciding to inline one operator over another. Somehow my hunch is that it's still the Fleece issue, and an operator name collision between FSharpPlus and a simpler library would not affect compilation time as much.

Maybe we should simply give Fleece operators unique names, or provide plain function alternatives?

gusty commented 5 months ago

We can summarize this issue as a combination of 4 things:

1 - F# ALWAYS treats let bound operators with higher priority than static member operators 2 - open FSharpPlus opens generic operators 3 - Fleece has a less generic definition of those same operators 4 - SRTP code degraded as from .net7

It is enough to remove one of the 4 facts and the issue is gone, but this is like aircraft accidents: not a single but a chain of events together which cause the issue.

@DunetsNM I think for a "medium term" solution, (see below) changing Fleece operators is not that great, it is good that they match, so it reflects they are effectively the same (only less generic), moreover many types in FSharpPlus.Data use the same approach: operators defined at the type which are the same as the generic ones, but of course less generics.

I can think of 3 actions:

Short term solution: As I suggested, maybe the last option is better than fully qualifying, what I mean you can keep open FSharpPlus where you have it now and define somewhere something like:

module FleeceCodecOperators = 
    open Fleece
    let (<!>) x y = (<!>) x y
    let (<|>) x y = (<|>) x y
    let (<*>) x y = (<*>) x y
    let (*>) x y = (*>) x y
    ... and so on

then right before defining your codecs you open FleeceCodecOperators, then the generic operators will be shadowed.

Medium term solution: Finish and release FSharpPlus v2, without autoopening operators. This would solve this issue, but it's not that great for other uses of F#+, because you will have to do something like open FSharpPlus.Operators.Generic but what if I want just a portion of the generic operators and not the rest?

Long term solution: Create a suggestion, RFC and implement something (maybe an attribute?) in the F# compiler which allows to define let-bound operators with lower priority than static members, as I can see this problem will affect code outside FSharpPlus in the long run. @vzarytovskii , @dsyme what do you think?

DunetsNM commented 5 months ago

Besides the clash of <!> and <* operators between Fleece and FSharpPlus the implicit codec resolution is also very expensive in net7 when nothing is known about the generic type at compile time i.e. jreq or equivalent jreqWith Fleece.Operators.defaultOf<_, 't> when 't is a truly generic type parameter where argument type can be anything (a primitive, unit, a tuple of other types, or a custom F# type with suitable static member Codec) can cost couple of seconds of extra compilation time for every invocation (!).

When 't is reasonably constrained (e.g. it implements some interface, which statically eliminates tuples and primitives etc.) then defaultCodec<_, 't> compiles fast - just like for a non-generic type.

Fortunately there's not so many generic types with codecs where truly unconstrained type parameter is required, however even when it's not required based on usages sometimes it's simply not enforced in type signature which makes defaultCodec a potential footgun.

To work around that I have a custom function that is less universal than defaultCodec (works only for types with static member Codec) but at least reliably compiles fast:

let inline codecFor<'Encoding, 'T when 'T : (static member Codec: Codec< 'Encoding, 'T>) > =
    (^T : (static member Codec: Codec< 'Encoding, ^T>) ())
gusty commented 5 months ago

I guess that would still need generic resolution for primitive types inside T.

Here there's no way around, so far the only mechanism available in F# to retrofit an interface-like is doing this kind of complex SRTP overloads, at least until FS-1043 is finished and released.

BTW: Can we close this issue, since the problem was identified and a solution (actually more than one) proposed?

We should continue this in FSharpPlus v2 and also open a new F# compiler issue (and maybe a suggestion for the operators scope), linking this one.

DunetsNM commented 5 months ago

yes I guess we can close it with "workaround exists"

gusty commented 5 months ago

Yes, we can conclude the main issue is the F# compiler degradation, you were lucky you were not relaying in generic code, only by accident.