dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.87k stars 781 forks source link

Compiling causes OutOfMemoryException and IDE unresponsive/crash when pattern matching against record with many nested DU fields #4691

Open cmeeren opened 6 years ago

cmeeren commented 6 years ago

I expecience OutOfMemoryException when compiling code that pattern matches many cases on a record with many fields that are nested DU (e.g. int option option as shown in the repro below, but I originally experienced it with other DU types). Furthermore, the IDE/Intellisense becomes unresponsive and may crash.

Repro steps

Test.zip

Either try the test solution, or simply create a new .NET Standard library (others may produce the error too, but I haven't tested), paste the following code and compile/observe IDE behavior.

module Test

type ManyFields =
  {
    Field1: int option option
    Field2: int option option
    Field3: int option option
    Field4: int option option
    Field5: int option option
    Field6: int option option
    Field7: int option option
    Field8: int option option
    Field9: int option option
    Field10: int option option
    Field11: int option option
    Field12: int option option
    Field13: int option option
    Field14: int option option
    Field15: int option option
    Field16: int option option
    Field17: int option option
    Field18: int option option
    Field19: int option option
    Field20: int option option
  }

let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | { Field11 = Some (Some i) } -> Some i
  | { Field12 = Some (Some i) } -> Some i
  | { Field13 = Some (Some i) } -> Some i
  | { Field14 = Some (Some i) } -> Some i
  | { Field15 = Some (Some i) } -> Some i
  | { Field16 = Some (Some i) } -> Some i
  | { Field17 = Some (Some i) } -> Some i
  | { Field18 = Some (Some i) } -> Some i
  | { Field19 = Some (Some i) } -> Some i
  | { Field20 = Some (Some i) } -> Some i
  | _ -> None

Expected behavior

The code compiles successfully.

Actual behavior

Compilation fails with the following output (the number of exceptions may vary):

1>------ Rebuild All started: Project: Test, Configuration: Debug Any CPU ------
1>error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown.
1>error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown.
1>Done building project "Test.fsproj" -- FAILED.
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

Known workarounds

let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | _ -> match manyFields with
      | { Field11 = Some (Some i) } -> Some i
      | { Field12 = Some (Some i) } -> Some i
      | { Field13 = Some (Some i) } -> Some i
      | { Field14 = Some (Some i) } -> Some i
      | { Field15 = Some (Some i) } -> Some i
      | { Field16 = Some (Some i) } -> Some i
      | { Field17 = Some (Some i) } -> Some i
      | { Field18 = Some (Some i) } -> Some i
      | { Field19 = Some (Some i) } -> Some i
      | { Field20 = Some (Some i) } -> Some i
      | _ -> None

Related information

This SO answer by Tomas Petricek looks relevant. If it's the same underlying problem, then it has been around since at least 2012. About time it's fixed :)

Versions:

realvictorprm commented 6 years ago

@cmeeren Must it be an int option option?

(Is it reproducable with a simple int option too?)

cmeeren commented 6 years ago

Not reproducible with int option. Furthermore, if you match against Some i instead of Some (Some i), it also works fine.

This all seems to fit with the SO answer I linked to, which commented that when pattern matching, the compiler sometimes produces a decision tree that grows exponentially with the number of cases.

7sharp9 commented 6 years ago

Type inference also exhibits similar symptoms, when you compose a large expression you can cause a stack overflow (unless you add annotations )

On Fri, 6 Apr 2018 at 12:18, Christer van der Meeren < notifications@github.com> wrote:

Not reproducible with int option. Furthermore, if you match against Some i instead of Some (Some i), it also works fine.

This all seems to fit with the SO answer I linked to, which commented that when pattern matching, the compiler sometimes produces a decision tree that grows exponentially with the number of cases.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Microsoft/visualfsharp/issues/4691#issuecomment-379224054, or mute the thread https://github.com/notifications/unsubscribe-auth/AAj7yiPjNoF7239wqxi5l_tYki61Pfh4ks5tl08PgaJpZM4TJi8j .

abelbraaksma commented 6 years ago

What happens here, if you look at the compiled code, is that it gets compiled into nested if-statements for situations that are too difficult for the compiler to rewrite more efficiently (currently).

First I ran some numbers with your code as example (but Vitaliy's code from the SO question probably has the same characteristic), scenario: just this code in a single .NET Framework project, compiling for Release Build, but I assume .NET Standard and .NET Core would give the same results:

module Bloated =

    type ManyFields =
      {
        Field1: int option option   // 10kB
        Field2: int option option
        Field3: int option option
        Field4: int option option   // 12kB
        Field5: int option option
        Field6: int option option
        Field7: int option option
        Field8: int option option   // 24kB
        Field9: int option option   // 36kB
        Field10: int option option  // 60kB
        Field11: int option option  // 106kB
        Field12: int option option  // 200kB
        Field13: int option option  // 387kB, 2s
        Field14: int option option  // 761kB, 3.5s
        Field15: int option option  // 1509kB, 7s
        Field16: int option option  // 3005kB, 16s
        Field17: int option option  // 6MB, 35s -- ~800MB
        Field18: int option option  // 12MB, 1m -- ~1200MB
        Field19: int option option  // probably 24MB, -- after ~1500MB BOOM! CRASH!
      }

    let getFirstField manyFields =
      match manyFields with
      | { Field1 = Some (Some i) } -> Some i
      | { Field2 = Some (Some i) } -> Some i
      | { Field3 = Some (Some i) } -> Some i
      | { Field4 = Some (Some i) } -> Some i
      | { Field5 = Some (Some i) } -> Some i
      | { Field6 = Some (Some i) } -> Some i
      | { Field7 = Some (Some i) } -> Some i
      | { Field8 = Some (Some i) } -> Some i
      | { Field9 = Some (Some i) } -> Some i
      | { Field10 = Some (Some i) } -> Some i
      | { Field11 = Some (Some i) } -> Some i
      | { Field12 = Some (Some i) } -> Some i
      | { Field13 = Some (Some i) } -> Some i
      | { Field14 = Some (Some i) } -> Some i
      | { Field15 = Some (Some i) } -> Some i
      | { Field16 = Some (Some i) } -> Some i
      | { Field17 = Some (Some i) } -> Some i
      | { Field18 = Some (Some i) } -> Some i
      | { Field19 = Some (Some i) } -> Some i
      | _ -> None

Since I have 48GB available, the OutOfMemoryException must be caused by some 32-bit or other limit. After each change, VS's memory also spiked, but then calmed down after a minute or so to a steady 1GB.

The numbers clearly show a correlation between build time, size and memory usage, each growing exponentially. This suggests that the same algorithm is at fault here that @tpetricek analyzed in that SO question. If I take a look at the decompiled C# code for only 3 fields (otherwise it gets unreadable), it quickly becomes clear what's happening:

    public static FSharpOption<int> getFirstField(ManyFields manyFields)
    {
        FSharpOption<FSharpOption<int>> option;
        int num;
        FSharpOption<FSharpOption<int>> option2;
        if (manyFields.Field1@ == null)
    {
            if (manyFields.Field2@ != null)
        {
                option = manyFields.Field2@;
                if (option.get_Value() != null)
                {
                    num = option.get_Value().get_Value();
                    goto Label_0050;
                }
                if (manyFields.Field3@ == null)
            {
                    goto Label_0081;
                }
                option2 = manyFields.Field3@;
                if (option2.get_Value() == null)
                {
                    goto Label_0081;
                }
                num = option2.get_Value().get_Value();
            }
        else
        {
                if (manyFields.Field3@ == null)
            {
                    goto Label_0081;
                }
                option = manyFields.Field3@;
                if (option.get_Value() == null)
                {
                    goto Label_0081;
                }
                num = option.get_Value().get_Value();
            }
            goto Label_007A;
        }
        option = manyFields.Field1@;
        if (option.get_Value() != null)
        {
            return FSharpOption<int>.Some(option.get_Value().get_Value());
        }
        if (manyFields.Field2@ == null)
    {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option2 = manyFields.Field3@;
            if (option2.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option2.get_Value().get_Value();
            goto Label_007A;
        }
        option2 = manyFields.Field2@;
        if (option2.get_Value() == null)
        {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
            if (option3.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option3.get_Value().get_Value();
            goto Label_007A;
        }
        num = option2.get_Value().get_Value();
        Label_0050:
        return FSharpOption<int>.Some(num);
        Label_007A:
        return FSharpOption<int>.Some(num);
        Label_0081:
        return null;
    }

In other words, it looks like the compiler cannot ascertain that if a first match succeeds partially, that other parts need no tests anymore, hence the recursive repetition of all cases under each main if-branch.

In Vitaliy's question there was something to say for the way the compiler treats this, since it was about interface matching, and any class could have multiple interfaces (which doesn't mean it cannot be optimized, but it is harder). But in this case, that shouldn't be necessary at all.

I remembered this situation and I have some places where I have put in some exclamation marks "Do not simplify this code from nested patterns, it creates bloated IL!!!". Though it never occurred to me that the OOM is related.

Furthermore, we sometimes have mysterious crashes of Visual Studio simply disappearing, or FCS crashing behind the scenes. It is very well possible that some of these issues are related to this.

One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem.

7sharp9 commented 6 years ago

Yeah vs is 32bit so you will only really have 2GBish to play with

On Fri, 6 Apr 2018 at 14:33, Abel Braaksma notifications@github.com wrote:

What happens here, if you look at the compiled code, is that it gets compiled into nested if-statements for situations that are too difficult for the compiler to rewrite more efficiently (currently).

First I ran some numbers with your code as example (but Vitaliy's code from the SO question probably has the same characteristic), scenario: just this code in a single .NET Framework project, but I assume .NET Standard and .NET Core would give the same results:

module Bloated =

type ManyFields =
  {
    Field1: int option option   // 10kB

    Field2: int option option
    Field3:

int option option Field4: int option option // 12kB

    Field5: int option option
    Field6: int option option
    Field7:

int option option Field8: int option option // 24kB Field9: int option option // 36kB Field10: int option option // 60kB Field11: int option option // 106kB Field12: int option option // 200kB Field13: int option option // 387kB, 2s Field14: int option option // 761kB, 3.5s Field15: int option option // 1509kB, 7s Field16: int option option // 3005kB, 16s Field17: int option option // 6MB, 35s -- ~800MB Field18: int option option // 12MB, 1m -- ~1200MB Field19: int option option // probably 24MB, -- after ~1500MB BOOM! CRASH!

  }

let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | { Field11 = Some (Some i) } -> Some i
  | { Field12 = Some (Some i) } -> Some i
  | { Field13 = Some (Some i) } -> Some i
  | { Field14 = Some (Some i) } -> Some i
  | { Field15 = Some (Some i) } -> Some i
  | { Field16 = Some (Some i) } -> Some i
  | { Field17 = Some (Some i) } -> Some i
  | { Field18 = Some (Some i) } -> Some i
  | { Field19 = Some (Some i) } ->

Some i | _ -> None

Since I have 48GB available, the OutOfMemoryException must be caused by some 32-bit or other limit. After each change, VS's memory also spiked, but then calmed down after a minute or so to a steady 1GB.

The numbers clearly show a correlation between build time, size and memory usage, each growing exponentially. This suggests that the same algorithm is at fault here that @tpetricek https://github.com/tpetricek analyzed in that SO question. If I take a look at the decompiled C# code for only 3 fields (otherwise it gets unreadable), it quickly becomes clear what's happening:

public static FSharpOption<int> getFirstField(ManyFields manyFields)
{
    FSharpOption<FSharpOption<int>> option;
    int num;
    FSharpOption<FSharpOption<int>> option2;
    if (manyFields.Field1@ == null)
{
        if (manyFields.Field2@ != null)
    {
            option = manyFields.Field2@;
            if (option.get_Value() != null)
            {
                num = option.get_Value().get_Value();
                goto Label_0050;
            }
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option2 = manyFields.Field3@;
            if (option2.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option2.get_Value().get_Value();
        }
    else
    {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option = manyFields.Field3@;
            if (option.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option.get_Value().get_Value();
        }
        goto Label_007A;
    }
    option = manyFields.Field1@;
    if (option.get_Value() != null)
    {
        return FSharpOption<int>.Some(option.get_Value().get_Value());
    }
    if (manyFields.Field2@ == null)
{
        if (manyFields.Field3@ == null)
    {
            goto Label_0081;
        }
        option2 = manyFields.Field3@;
        if (option2.get_Value() == null)
        {
            goto Label_0081;
        }
        num = option2.get_Value().get_Value();
        goto Label_007A;
    }
    option2 = manyFields.Field2@;
    if (option2.get_Value() == null)
    {
        if (manyFields.Field3@ == null)
    {
            goto Label_0081;
        }
        FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
        if (option3.get_Value() == null)
        {
            goto Label_0081;
        }
        num = option3.get_Value().get_Value();
        goto Label_007A;
    }
    num = option2.get_Value().get_Value();
    Label_0050:
    return FSharpOption<int>.Some(num);
    Label_007A:
    return FSharpOption<int>.Some(num);
    Label_0081:
    return null;
}

In other words, it looks like the compiler cannot ascertain that if a first match succeeds partially, that other parts need no tests anymore, hence the recursive repetition of all cases under each main if-branch.

In Vitaliy's question there was something to say for the way the compiler treats this, since it was about interface matching, and any class could have multiple interfaces (which doesn't mean it cannot be optimized, but it is harder). But in this case, that shouldn't be necessary at all.

I remembered this situation and I have some places where I have put in some exclamation marks "Do not simplify this code from nested patterns, it creates bloated IL!!!". Though it never occurred to me that the OOM is related.

Furthermore, we sometimes have mysterious crashes of Visual Studio simply disappearing, or FCS crashing behind the scenes. It is very well possible that some of these issues are related to this.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/Microsoft/visualfsharp/issues/4691#issuecomment-379254576, or mute the thread https://github.com/notifications/unsubscribe-auth/AAj7yvm8AI4lb1xu3BGwcowWuMUW9jXNks5tl26vgaJpZM4TJi8j .

cmeeren commented 6 years ago

One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem.

It might also suggest that it simply hasn't been prioritized for whichever reason. (I wouldn't know, of course, but it's a possibility.)

cartermp commented 6 years ago

I can't speak towards this issue since it may be related to the initial design of the compiler, but there are all sorts of little weird things that can happen in compilers which lead to OOM or other bad behavior.

For example, getting clever with generics is a good way to hose the C# compiler:

class Class<A, B, C, D, E, F>
{
    class Inner : Class<Inner, Inner, Inner, Inner, Inner, Inner>
    {
        Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner inner;
    }
}

(If this works just fine for you, just add some more generics)

The C# compiler also solves the 3-SAT problem for overload resolution, so if you're clever with lambdas you may be in for a surprise as well. Haskell and other ML languages also have awful worst-case scenarios for type inference that you can run into from time to time. Swift can also stop compiling after a while on very trivial code and say the expression is too complex.

abelbraaksma commented 6 years ago

@cartermp, thanks for mentioning the 3-SAT problem, reading up on that gives some hints on how this problem can be problematic.

Though I don't think it is unsolvable, because DU matching does not necessarily need to be compiled this way. For instance, it could be compiled as a list of if...elif...elif...else... clauses. I can imagine that for instance the current approach is good for any match expression up until 4-5 branches, but after that it becomes unwieldy and a different approach should be used.

As an optimization to the if...elif...elif...else approach, a grouping can be done if the first part of the match is a DU itself, a boolean or another known limited set, which could first be compiled in a switch (i.e., table lookup). This, btw, happens already in the general case, but not in slightly more complex cases.

dsyme commented 6 years ago

@cmeeren Thanks for the report. There are a couple of ways that pattern matching can cause slow compilation, but this looks like one we should be able to address

cmeeren commented 6 years ago

@dsyme very glad to hear it. Especially if @abelbraaksma is correct in that

solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things.

abelbraaksma commented 6 years ago

Especially if @abelbraaksma is correct in that

I like to think I am ;). The code above can easily be tested, just check the compiled binaries (and watch the clock and memory usage by the compiler & VS).

dsyme commented 6 years ago

solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things.

The majority of pattern matching code compiles well. I suspect the problem here is something combination of things, I suspect that implicit extra record fields may be being expanded and bound in the record patterns, even if they are wildcards, though that shouldn't ultimately cause the extra code.

abelbraaksma commented 6 years ago

The majority of pattern matching code compiles well.

Issue #5212 suggests that exponential growth can happen really easy and diesn't require record fields. Looking at the IL I don't think this is caused by implicit record fields per se.

It basically looks as if all permutations are being created by the compiler, while that isn't necessarily required (technically, it could compile to a list of elif instead). For small patterns this may hardly be a problem, though still suboptimal, but with (slightly) larger patterns this can lead to quickly increasing binary sizes, and with it longer JIT times.

cartermp commented 5 years ago

Another thing to keep in mind here is that this can basically bring any editor to a crawl. Here's VS on my laptop (granted, via Parallels)

Screen Shot 2019-05-14 at 20 35 42

This is just from pasting in the text in the OP. VS just goes unresponsive on and off seemingly indefinitely.

TIHan commented 5 years ago

In a matter of seconds, I can get VS to reach OOM levels. Ran a perf trace on it, these are the allocations. Yea, this is a bug in pattern matching.

image

psfinaki commented 4 months ago

The example code didn't cause SO on my machine but it still took ages to compile and made the machine unusable for some time.

On the other hand, given all the notes in this discussion (e.g. that it needs this much cases or that it needs to be not options but option options) - I wonder how acute this is. Probably still worth looking at it at least to see if there are some glaring misses in the pattern matching impl.

vzarytovskii commented 4 months ago

I think I've fixed the SO some time ago, but yes, it is slow, which is kinda expected (not desired though).