fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
345 stars 21 forks source link

Make F# string functions fast and AOT/linker friendly #919

Open charlesroddie opened 4 years ago

charlesroddie commented 4 years ago

The need for AOT and linker support

.Net code is deployed to devices where performance - including startup time - and download size are important. JIT is ruled out by this and may be explicitly prohibited (UWP and iOS) or result in unacceptably slow application startup (Xamarin.Android).

Current .Net plans (.Net Form Factors) involve a supported AOT option across .Net, with greater usage of linkers:

We will introduce analyzers that report errors or warnings for code patterns that are not compatible with the specific form factors.

We will introduce a linker compatibility analyzer to detect code patterns that cannot be reliably analyzed by the linker.

Source generators are going to be the preferred [reflection/runtime code generation] mitigation for AOT compatibility, same as for linker compatibility above.

F# string problems affecting AOT/linker support

F# Support lists incompatibilities with CoreRT and .Net Native. These pick out the aspects of F# that are likely to be problematic for any performant AOT and linkers. (Note: F# is generally compatible with current mono AOT but this is not performant, and compatibility may not include linking.)

The largest problem here is F# string methods:

The offending part of FSharp is lacking in type safety, with everything done via casting, uses reflection without restraint, and encapsulates poorly (e.g. .ToString() methods in FSharp record and DU types just calling sprintf "%A" on the entire record).

Solution part 1: localize by generating ToString() overrides on F# types

We have effectively, on F# record and DU types (just check SharpLab to confirm this):

override t.ToString() = sprintf "%A" t

The sprintf "function" doesn't have legitimate access to the data needed to generate the string, so has to use reflection to get the structure.

Instead this method should be compiled:

type DU = | Case0 of int | Case1 of Record | Case2 of obj

// A compiled version of the following should be generated:
override t.ToString() =
    match t with
    | Case0 i ->
        // I.e. "Case0" + i.ToString() + ")"
        // The inner string results from CompiledToString<int>(i)
        "Case0(%s{i.ToString()})" // i.e. "Case0" + i.ToString() + ")"
    | Case1 r -> "Case1(%s{r.ToSting()})"
    | Case2 o ->
        // if we absolutely need to preserve backwards compat
        "Case1(%A{o})" // i.e. "Case1(" + FSharp.FormatObj o + ")"
        // otherwise
    "Case2(%s{o.ToString()})" // i.e. "Case1(" + o.ToString() + ")"

Note that once this is done, CompiledToString does not need to know how to print records and DUs.

Solution part 2: compile

A method (represented above as pseudocode CompiledToString<'t>) should be created to generate compiled code for any concrete type 't, to be used in place of the current dynamic sprintf code where possible.

Where the method sees only obj it could preferably use .ToString(), or else use a very light form of reflection, making sure that codegen is not used.

Solution part 3: integrate

We need to decide where to use the method CompiledToString<'t>:

It may be simplest to start by doing this for string interpolation as the first step, adding extra methods and preserving existing sprintf code, and afterwards migrate this work to sprintf.

TBD

Generics/inlines

abelbraaksma commented 4 years ago

I like this a lot. Ideally, there wouldn't be any %A leftovers in the generated ToString code. In fact, I think generated code wouldn't need sprintf at all, nor interpolation. Both methods are very slow, and I think that the compiler has enough information to create better optimized code here.

Undoubtedly, there'll be corner cases where we may have to resort to old style reflection to prevent backwards compatibility issues, but hopefully not many.

kerams commented 3 years ago

So if I understand this correctly (see below), parts 1 and 2+3 could be worked on separately, with completely independent RFCs?

Part 1 - Compiler emitting specialized ToString for records and unions, maybe using StringBuilder (or plain concatenation if it proves faster for a small number of fields), with hardcoded field/case names and calls to ToString on individual field values.

Part 2 + 3 - Compiler emitting specialized formatted printing instructions every time it encounters *printf and interpolated strings. So the compiler would no longer emit any Printf.*Format unless a string is explicitly annotated as such?

An official verdict would be nice here.

charlesroddie commented 3 years ago

@dsyme can this be approved in principle and should it be two RFCs as @kerams suggests? Part 1 does have value independent of 2/3 and it should make the process of RFC speccing and implementation more efficient if split up.

dsyme commented 3 years ago

Part 1 - Compiler emitting specialized ToString for records and unions,

I'm ok with approving this though compat may be very tricky. I don't think it needs an RFC.

Part2 - F# string methods use reflection and codegen, so are slow and AOT and linker unfriendly (i.e. are liable to crash on runtime).

These are bugs in the .NET native systems - TBH I'm not a fan of buggy and incomplete implementations of things calling themselves .NET that are not .NET, and feeel it's a bit of a waste of time trying to chase them down. That said, it makes sense to make the codegen independent of printf for performance reasons alone. :)

kerams commented 3 years ago

though compat may be very tricky

Can you please elaborate on this? I somewhat naively thought the only tricky part would be preserving the same indentation.

dsyme commented 3 years ago

Can you please elaborate on this? I somewhat naively thought the only tricky part would be preserving the same indentation.

The problem is that in theory the recursive structural calls for printing would have to do what %A does - which is not a simple thing at all.

We could consider a breaking change and simply call ToString() recursively. However that might bite us quite badly - e.g. %A handles deep and recursive object graphs

Happypig375 commented 3 years ago

We could use an IndentedTextWriter which handles indentation quite nicely.

charlesroddie commented 3 years ago

We could consider a breaking change and simply call ToString() recursively. However that might bite us quite badly - e.g. %A handles deep and recursive object graphs

Hopefully some shortcuts that result in cleaner code will be allowed, but we probably would need some degree of going into object graphs. I believe none of the string printing functions ever had a spec so are full of inconsistencies and some of them might need cleaning up in the process.

For fidelity:

So sometimes ToString() will be good, and sometimes we need a "function" recString that is less clever than sprintf that recursively goes into collections and tuples and massages options. This could be fully compiled. But it's not essential as recString, while it would have type tests, will be reflection/codegen-free.

kerams commented 3 years ago

@Happypig375, that part's not the problem. Every time a record has another record in a field, the indentation goes up.

type R = { S: int; Z: R option }
let r = { S = 3; Z = Some { S = 2; Z = Some { S = 5; Z = None } } }

r.ToString()
(*
{ S = 3
  Z = Some { S = 2
             Z = Some { S = 5
                        Z = None } } }
*)

So you can't just call ToString when printing the recursive field's value, because you'd get

{ S = 3
  Z = Some { S = 2
  Z = Some { S = 5
  Z = None } } }

I think the solution would be to provide another ToString overload where you'd be able to pass an indented text writer or string builder with indentation. That one instance would have to be threaded all the way down with increasing indentation. Now the big problem arrives when you use a record from an old binary where ToString is just sprintf %A. Not sure what to do then, possibly back to reflection.

Happypig375 commented 3 years ago

We can split line breaks and indent.

kerams commented 3 years ago

That's going to work until you encounter a field containing a multiline string :).

Happypig375 commented 3 years ago

Then you have to think about whether sudden unindentations like that are visually appealing.

dsyme commented 3 years ago

Hopefully some shortcuts that result in cleaner code will be allowed, but we probably would need some degree of going into object graphs. I believe none of the string printing functions ever had a spec so are full of inconsistencies and some of them might need cleaning up in the process.

The behaviour of '%A' itself is now better documented, including some of the inconsistencies. https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/plaintext-formatting#a-formatting . However it's just not a spec that can be implemented in compositional fragments by using ToString() alone, since it takes a more holistic view (object graph, depth, indentation etc.)

However the problem isn't so much the spec/not spec, it's that code may depend on existing behaviour regardless. We hit this recently when we tried to change %A for RequireQualfiedAccess types to print TheType.UnionCaseA instead of UnionCaseA. This affected the ToString and broke some consumers who were relying on the simpler name for reflection or something.

That said I'm guessing that the compat issue would be manageable by making sure all primitive non-nested records/unions print identically. However if not, another option might be to have an attribute that guides the generation of a printf-free ToString. That makes it opt-in and avoids the compat issue.

charlesroddie commented 3 years ago

@dsyme I'm ok with approving this though compat may be very tricky. I don't think it needs an RFC.

Can you mark the issue approved in principle then please?

That said I'm guessing that the compat issue would be manageable by making sure all primitive non-nested records/unions print identically.

Yes

@dsyme , @Happypig375 and @kerams have raised indentation as a difficult problem. I propose dispensing with line breaks altogether (for ToString() on record types). That eliminates the indentation problem. Alternatively, regularize the formatting so that indentation happens to entire output of an inner string.

// This logic is simple and only needs to know about ToString() on inner elements. (Use knowledge that the inner thing is a record to avoid extra brackets.)
{ S = 3; Z = Some { S = 2; Z = Some { S = 5; Z = None } } }

// Similarly this will also work without knowing more than ToString() on inners.
{   S = 3 // single-line
    Z =
        Some // multi-line inner record string, so put it all on separate lines and indent together
            {
                S = 2
                Z =
                    Some
                        {
                            S = 5
                            Z = None
                        }
            }
}

I prefer the first version with a single line.

dsyme commented 3 years ago

I don't really understand the technical proposal here. I believe replacing the default ToString implementations just won't be possible without breaking compat - that is, in some sense the existing object-to-layout engine in FSharp.Core will need to be used, though maybe not via %A. Perhaps there's a technique to do it where each F# type we care about implements an interface that provides the appropriate part of the processing.

May I ask - why not just have the user who cares it implement their own ToString on each type?

knocte commented 3 years ago

why not just have the user who cares it implement their own ToString on each type?

Don, are you serious?

Just so that you know how painful this is, take a look at our workaround's diff:

https://github.com/nblockchain/geewallet/commit/cd5ce122ae7957ad4b8347293795c333caad1dfb

(The important bits are in FSharpUtil.fs, the rest are a consequence of that, but as you see, we've had to implement a task for CI that detects when people use sprintf/printf/failwithf and pokes the developer to use our ugly SPrintF1,SPrintF2,...,SPrintFn workarounds instead.)

knocte commented 3 years ago

(Not to mention that any F# dependencies we have, we gotta create our own forks that don't use sprintf/printf/failwithf...) It doesn't scale. We actually de-prioritised UWP because of this nightmare.

dsyme commented 3 years ago

Just so that you know how painful this is, take a look at our workaround's diff:

Well, it's painful because you were running on an incomplete .NET implementation. I'm not surprised you de-prioritised it.

More positively, I can see the advantages of allowing people a path to tighten up their code to be free of implicit reflection, much as with implicit boxing. What's a practical path to that without bifurcation? Would an assembly-level attribute that turns off .ToString() generation for record/union/struct and gives warnings on all problematic constructs (sprintf, %A in interpolated strings) be sufficient? Also is sprintf OK apart from %A?

knocte commented 3 years ago

I think the world should have been telling the UWP team "this is not a valid implementation of .NET, please don't call it .NET".

Not sure why you deleted this part of your message, but the thing is, me, as a ISV, how do I deal with this issue? I cannot publish an F# app on the Windows Store. I don't care about UWP, I just need something that allows me to publish to the WindowsStore. And fixing this github issue would allow me to.

dsyme commented 3 years ago

how do I deal with this issue

I don't really know, sorry. My vague understanding is that other routes have since opened up for publishing to Windows Store using either JS front-ends (use Fable?) or sandboxed .NET Framework+WPF or something, but I could be wrong. Were your workarounds to lint for problematic constructs sufficient? If the use of %A is the only known problem then it sounds like they would be?

knocte commented 3 years ago

Were your workarounds to lint for problematic constructs sufficient?

We're not 100% sure because we de-prioritized the effort. After committing the workaround commit you saw, we still had issues but didn't have time to investigate if they come from the fact that our dependencies still use printf/sprintf/failwithf, or from other AOT-related problems; because finding a workaround for the dependencies problems is too time consuming. However, fixing this github issue (to make FSharp.Core not use reflection for the simplest cases) would be much simpler.

knocte commented 3 years ago

either JS front-ends (use Fable?)

Having to rewrite my frontend which already works in so many platforms (Android, iOS, macOS, Linux)* with a single codebase? Quite ironic that using .NET one cannot really support the Windows platform... thanks F# :'-(

* thanks to XamarinForms (soon Maui)

cartermp commented 3 years ago

I cannot publish an F# app on the Windows Store.

This is the never-ending saga of the awfulness that is the Windows Store and Win10 apps 😢. It's unsurprising to see that 5+ years later, it's still as terrible as ever.

I hear that at some point they're going to allow actual, bona fide windows apps running against a real .NET implementation into the store. Or so I was told year after year, and yet that never came.

My honest advice is to abandon Windows app development for the store in general. It is not a stable platform.

charlesroddie commented 3 years ago

If the criterion of making real apps is that the user's machine has to JIT compile the code, I am happy to deliver fake apps that load quickly. Even the F# team is not completely indifferent to the performance benefits of AOT.

Of late this thread has got confused and even if we ignore everything except the comments of @dsyme it is not easy to follow. Let me try to piece this together.

@dsyme I'm ok with approving this though compat may be very tricky. I don't think it needs an RFC. @dsyme I don't really understand the technical proposal here.

Can we know how to progress? Either mark approved in principle (so work can be started with some flexibility in the degree of mimicing of existing behavior) or require an RFC if it needs fully speccing first.

@dsyme We could consider a breaking change and simply call ToString() recursively. However that might bite us quite badly - e.g. %A handles deep and recursive object graphs @dsyme However the problem isn't so much the spec/not spec, it's that code may depend on existing behaviour regardless. We hit this recently when we tried to change %A for RequireQualfiedAccess types to print TheType.UnionCaseA instead of UnionCaseA. This affected the ToString and broke some consumers who were relying on the simpler name for reflection or something. That said I'm guessing that the compat issue would be manageable by making sure all primitive non-nested records/unions print identically.

Yes we are agreed on this. Printing of simple union cases is much more likely to lead to embarrased F# devs depending on the old behavior than, say, the formatting of records. So we should get a version working which requires enough fidelity to be a good representation and not to break users, but which doesn't require completely identitcal output in all situations.

@dsyme I believe replacing the default ToString implementations just won't be possible without breaking compat - that is, in some sense the existing object-to-layout engine in FSharp.Core will need to be used, though maybe not via %A. Perhaps there's a technique to do it where each F# type we care about implements an interface that provides the appropriate part of the processing.

It might be possible to achieve something close to an exact mimic of existing behavior by adding extra methods, but it reduces the chance that this ever gets done, which then reduces the ability of F# developers to write performant apps. We should allow any approach that keeps reasonable output, and is unlikely to break users, as we discussed above.

dsyme commented 3 years ago

I do go back and forth on this stuff. When optimistic I say "yes, sure", when realistic I say "no, compat".

The history of this goes back so far.

You can see why I sometimes a little of jaded of these "alternative .NETs" - these variations on .NET have caused 1000s of lost hours and huge opportunity costs for myself and the projects I've been associated with over the years. As an aside one of the amazing things about Mono is that it always did all of .NET (it took them several iterations for iOS though).

Anyway, this seems like a viable approach to me:

More positively, I can see the advantages of allowing people a path to tighten up their code to be free of implicit reflection, much as with implicit boxing. What's a practical path to that without bifurcation? Would an assembly-level attribute that turns off .ToString() generation for record/union/struct and gives warnings on all problematic constructs (sprintf, %A in interpolated strings) be sufficient? Also is sprintf OK apart from %A?

That is, a single assembly-level attribute [<SimplifiedCodeGen>] or [<ReflectionFreeCode>] or something that causes the compiler to turn off or simplify problematic code-generation and warn and warn and warn as much as you like. As far as I'm concerned that approach is perfect

  1. The community can own the spec of this, and I largely don't need to think about it.
  2. It's very easy to check that any changes to compiler behaviour are made conditional on the attribute
  3. It's simple to use
  4. Those using this can advocate for other people to use this switch in libraries you depend on

Sound ok?

mrange commented 3 years ago

A bit off-topic but as string interpolation was mentioned as better than sprintf.

Personally I don't want to switch to string interpolation over sprintf because string interpolation is using CurrentCulture and AFAIK sprintf uses InvariantCulture which to me is the right choice. The subjectively improved code style of string interpolation is not worth it to me.

I might be in a minority of 1 but I will continue using sprintf.

So I prefer improvements to be done to codegen of sprintf rather than deprecating it in favor of string interpolation.

dsyme commented 3 years ago

@mrange String interpolation uses %O for unadorned { ... }, which is current culture, but you can use the same formatting specifiers as sprintf which also giving increased type safety, e.g. %d{...}, which are invariant culture. So I think it's not a major difference, and would still encourage the use of string interpolation.

At runtime the implementation is the same with regard to reflection, e.g for %A patterns, though sprintf also does some additional reflection to create the curried function which accepts the arguments.

As an aside I could see the argument for an assembly attribute or analyzer which warns on using current-culture or invariant-culture constructs respectively.

charlesroddie commented 3 years ago

The history of this goes back so far.

It's an interesting history, and I understand that these were annoying, but in this case the requirements are very much aligned with code quality - it really did pick out bad parts of compiled F# code - and alterations would improve multiple areas (performance, linking, AOT compat). So there will be very little downside.

That said, having [<ReflectionFreeCode>] would be a fine option to me. Would this resolve/would it have resolved your issues @knocte ? I suppose when this work gets done we can decide at that point whether to put it under a flag or not. Even if it ends up not going under a flag, the idea of [<ReflectionFreeCode>] has a lot of potential, as do any such flags that allow F# to progress without backwards compat constraints.

dsyme commented 3 years ago

@charlesroddie I'm glad you like it as a potential way forward. It seems tractable and I think the stakeholders at Microsoft/.NET w.r.t. backwards compat (myself and those responsible for delivering and servicing enterprise-quality releases) would accept this path. The attribute could also sit in preview for considerable time until it proved its utility.

kant2002 commented 2 years ago

@dsyme I'm still trying to poke at this issue. I have question which complicate my life, but probably better way to move forward. There property RuntimeFeature.IsDynamicCodeSupported (as of .NET 6). This property is always false on the AOT platform (including NativeAOT and Mono Full AOT). This should be Jit constant, so generated code for sprintf '%A' can distinguish between form-factors of .NET and provide slightly different implementation. Maybe that allow even implement changes not in compiler, but on FSharp.Core

vzarytovskii commented 2 years ago

@dsyme I'm still trying to poke at this issue. I have question which complicate my life, but probably better way to move forward. There property RuntimeFeature.IsDynamicCodeSupported (as of .NET 6). This property is always false on the AOT platform (including NativeAOT and Mono Full AOT). This should be Jit constant, so generated code for sprintf '%A' can distinguish between form-factors of .NET and provide slightly different implementation. Maybe that allow even implement changes not in compiler, but on FSharp.Core

What would be the behaviour if we run on an older runtime, which doesn't have this flag?

kant2002 commented 2 years ago

What would be the behaviour if we run on an older runtime, which doesn't have this flag?

I did not think about that when I wrote that. If exactly same code run on all platforms, then this is probable not an option. Would like that somebody educate me on compatibility for this solution. Also the more I think about how to solve the problem without deep dive into compiler, the more I fall closer to Don's proposal. Probably I'm over-react to initial idea of implementing change in FSharp.Core.

xperiandri commented 2 years ago

There property RuntimeFeature.IsDynamicCodeSupported (as of .NET 6). This property is always false on the AOT platform (including NativeAOT and Mono Full AOT).

I'm not sure, as far as I know from Mono compiler flags it has some ability to interpret some code

Happypig375 commented 2 years ago

What would be the behaviour if we run on an older runtime, which doesn't have this flag?

Keeping the status quo and just don't modify behaviour on older platforms. This would be an easy way out.

vzarytovskii commented 2 years ago

What would be the behaviour if we run on an older runtime, which doesn't have this flag?

I did not think about that when I wrote that. If exactly same code run on all platforms, then this is probable not an option. Would like that somebody educate me on compatibility for this solution. Also the more I think about how to solve the problem without deep dive into compiler, the more I fall closer to Don's proposal. Probably I'm over-react to initial idea of implementing change in FSharp.Core.

We shouldn't rely on runtime features which are only in the specific (quite recent) version of runtime. We still need to support the full framework and netstandard versions.

That said, I guess the attribute-driven approach should be the way to go. It's more explicit (we don't choose different codegen based on runtime), backwards compatible, and can be easily hidden under the compiler flag.

We will probably need an RFC for it, so we have the scope of this feature documented and agreed upon.

Happypig375 commented 2 years ago

We shouldn't rely on runtime features which are only in the specific (quite recent) version of runtime. We still need to support the full framework and netstandard versions.

Support is still there. Just not new changes.

vzarytovskii commented 2 years ago

What would be the behaviour if we run on an older runtime, which doesn't have this flag?

Keeping the status quo and just don't modify behaviour on older platforms. This would be an easy way out.

We probably don't want different codegen based on runtime alone.

Happypig375 commented 2 years ago

We probably don't want different codegen based on runtime alone.

Improved platforms and APIs, for improved codegen. Is that not desirable?

vzarytovskii commented 2 years ago

We probably don't want different codegen based on runtime alone.

Improved platforms and APIs, for improved codegen. Is that not desirable?

In this case, we don't give a choice to use the simplified version if you run on "unsupported" runtime (I.e. we are locking the feature to runtime version). You will still be able to do it in the attribute-driven approach.

Happypig375 commented 2 years ago

What about only requiring the attribute on platforms without RuntimeFeature.IsDynamicCodeSupported?

vzarytovskii commented 2 years ago

What about only requiring the attribute on platforms without RuntimeFeature.IsDynamicCodeSupported?

Attribute is compile-time, whereas RuntimeFeatures is, well, runtime. So we will be requiring attribute all the time, just in case we run on an unsupported runtime?

Happypig375 commented 2 years ago

RuntimeFeature.IsDynamicCodeSupported the property is not available compile-time?

Happypig375 commented 2 years ago

Not having the property is just equivalent to this returning true, and an attribute would interpret it as false.

vzarytovskii commented 2 years ago

RuntimeFeature.IsDynamicCodeSupported the property is not available compile-time?

It is a runtime flag, yeah.

Personally, I think we should go for an attribute, it's most explicit and straightforward, and will behave the same everywhere no matter what runtime.

dsyme commented 2 years ago

I would like to make progress on this. The current proposal is:

An assembly-level attribute [<SimplifiedCodeGen>] or [<ReflectionFreeCode>] that turns off .ToString() generation for record/union/struct and gives warnings on all problematic constructs (sprintf, %A in interpolated strings) and any other problematic code-generation and warn and warn and warn

I'd like to change this to a flag --reflectionfree. This makes it much easier to apply to all code during type checking as well.

kant2002 commented 2 years ago

@dsyme any other work happens in this area? I have very early attempt on hacking (nothing special) this which is stale right now, so I would like to know, if I somehow would be affected in other ways.

dsyme commented 2 years ago

I have a prototype, will share it in a bit.

The following construct in FSharp.Core uses '%A' formatting: Quotation ToString/GetLayout. However Quotations in general use a lot of reflection so I think this is expected.

dsyme commented 2 years ago

Here is my work in progress. Please contribute and try it out. I want this to be largely community led if possible :)

https://github.com/dotnet/fsharp/pull/12960

dsyme commented 2 years ago

@charlesroddie at al - I'd really like to understand if it is only %A that is a problem, or if any use of sprintf is a problem.

kant2002 commented 2 years ago

From my testing it’s only %A problem. Most other printf problems are annoying but not a showstopper.