fsharp / fslang-suggestions

The place to make suggestions, discuss and vote on F# language and core library features
341 stars 20 forks source link

Support Source Generators #864

Open praeclarum opened 4 years ago

praeclarum commented 4 years ago

Support Source Generators

Add support similar to C# Source Generators

The idea is to execute the compiler in two passes:

  1. Pass 1 Parse and type check the project code (the type check may be optional as it will contain errors)
  2. Send that information to Source Generators that output new code files or syntax trees
  3. Pass 2 Combine all code, type check, emit

The existing ways of approaching this problem in F# are:

  1. TypeProviders which take specialized knowledge to author.
  2. Custom build steps to emit code

Pros and Cons

The advantages of making this adjustment to F# are an easy form of meta programming. It's basically all the benefits of type providers without the complexity.

The disadvantages are the repetition of a feature and the compiler performance penalty of executing the type checker twice when this feature is used.

Extra information

Estimated cost (XS, S, M, L, XL, XXL): M (depending on what data is passed to the generators)

Affidavit (please submit!)

Please tick this by placing a cross in the box:

Please tick all that apply:

Happypig375 commented 4 years ago

You can also write an F# script to generate an F# file today.

kerams commented 4 years ago

It's basically all the benefits of type providers without the complexity.

A huge part of type providers is design-time support. Based on a quick look, source generators don't seem to be anything like that; more akin to/just an evolution of T4. I'd personally rather see https://github.com/fsharp/fslang-design/blob/master/RFCs/FS-1023-type-providers-generate-types-from-types.md than this.

robkuz commented 4 years ago

@kerams everybody seems to agree that type provider are enormously brittle (and that seems to be an understatement still) so having a standard, integrated, lightweight source generation facility would be more than welcome.

Krzysztof-Cieslak commented 4 years ago

</>

Swoorup commented 4 years ago

I think its worth targetting mid-level IR/AST representation rather than IL or just generating source code directly.

realvictorprm commented 4 years ago

Completely agree. Myriad is a better way of handling all this.

charlesroddie commented 4 years ago

@Krzysztof-Cieslak I think what this suggestion wants is some form of generating code or types that is better than the current state of type providers. I agree that generating strings is not the best approach.

Myriad can be in the mix of things to discuss here, as is an upgrade to type providers, as is something similar to C# source generators. If there is any link to a high-level write-up of Myriad it would be useful so it can be considered. I can't find anything online apart from this blog about its development process.

cartermp commented 4 years ago

@kerams the Source Generators feature is currently in a very early preview. There's extensive design-time support planned, which you can read the beginnings of here: https://github.com/dotnet/roslyn/blob/master/docs/features/source-generators.md#ide-integration.

There's some interesting challenges to solve that are not unlike what Type Providers struggle with today. For example, you want generated source to be up to date, so the safest thing to do is regenerate on every keystroke. This is effectively what Type Provider do today since they're asked to provide "fresh" types whenever the language service needs to re-typecheck things. The upside is correctness, but the downside is a huge hit to design-time performance. We somewhat work around this today in Type Providers with a series of caches that were added in the VS 2019 16.0/16.1 timeframe. They also serve as a band-aid around some more fundamental architectural flaws that force the TPSDK to hunt for and load big binaries in memory (when the compiler already has all the data it's looking for), leading to large Large Object Heap (LOH) allocations that ultimately kill IDE perf.

Currently the C# Source Generators simplify a lot of things by generating files in memory. But that offers some downsides; namely, not much C# tooling has a good understanding of them today. So there's a lot of design work that will probably go back and forth between compiler and IDE design until something acceptable emerges. Any F# implementation would likely utilize a similar mechanism once it stabilizes.

cartermp commented 4 years ago

As for the suggestion:

I think this is something we'll want to wait a bit for. Firstly, C# Source Generators are just in their first preview and could undergo complete design overhauls between previews based on feedback and scenarios that become more apparent. The current experience exists mostly so that early adopters can try things out, see what is missing or needs to change, and let the team know how it needs to change. There's also a lot of work to do in good IDE integration. Finally, although the string concatenation approach is extremely flexible, it's not necessarily going to stick or be the only way to do things. I personally prefer it to having to learn some complicated API with its own set of bugs and design flaws, but I could see how others would feel the opposite considering that there's pretty much no guarantees around correctness when you're just concatenating strings.

However, I suspect we'll eventually want to implement something that can "hook in" to the libraries that will ultimately end up using Source Generators. The blog post hints at some Microsoft frameworks and libraries adopting them. Realistically that won't happen for quite a while, largely because none of the "adopt source generators" work was costed for the .NET 5 timeframe. Perhaps an early prototype will emerge if Source Generators stabilize early enough. But in the long-term, I expect a lot of the .NET ecosystem to offer a "Source Generator path" for performance and better AOT-compilability. For F# to take part in these benefits, the F# compiler would also need a compatible feature. When the time comes, I think we'll likely look at a variety of options:

Table stakes would be ensuring that what is emitted can be consumed by .NET components so that F# developers can partake in the performance and AOT-ability gains. Post-.NET 5 it is highly likely that the .NET runtime team will focus heavily on AOT since it addresses a lot of pains people have with using .NET in production. This naturally means that some of heavy reliance on reflection in the .NET ecosystem may need a replacement. A Source Generator-like mechanism could be a key part of that for F#. Seriously considering these things is at least a year away though.

realvictorprm commented 4 years ago

@charlesroddie if there's interest I'm happy to write more about how to use Myriad / how to create plugins for Myriad etc.

OnurGumus commented 4 years ago

i think the proper way is to do it like nemerle macros.

cartermp commented 4 years ago

@OnurGumus I don't think we'll end up supporting syntactic macros: https://github.com/fsharp/fslang-suggestions/issues/210

praeclarum commented 4 years ago

Hi everyone, thanks for considering this. I want to be clear where I stand: I don't think that Source Generators are a wündertool of meta programming. I do think it's a very practical solution to a very real problem. I appreciate everyone wanting to do original research on hygienic macros, but this suggestion is specifically not that. :-)

My thoughts on the above criticisms:

Closing thoughts:

The C# ecosystem is about to be flooded with Source Generators. Already F# lags behind C# in tooling support. For example, Xamarin supports code generation for CoreML for C# but does not offer it for F#. The same is true for Storyboard support and XIB support and XML code behind. With every year, F# tooling falls behind C#. It is my opinion that it would benefit the F# community greatly to make writing tooling for F# easier.

cartermp commented 4 years ago

@praeclarum Just a note about tooling, I think that view is quite Xamarin-focused and not representative of where most developers are.

Xamarin is heavily C#-focused today, in large part due to how project integration tooling works, since it uses the so-called "legacy" project system and flavoring. This old technology is feature-rich but inflexible. Most of the .NET Core/Standard-based stuff uses a different, far more flexible system that has led to tooling that is about as equally available for F# as it is for C#. One example is the Azure-based tooling that is equally available for F# projects. Additionally, some of the excellent API design of things like ASP.NET Core has allowed for more F#-friendly entry points to emerge (ASP.NET Core supports F# async without requiring conversion to task, Giraffe and Saturn build directly atop its abstractions, etc.)

Perhaps future Xamarin components can be designed in a more pluggable way, like ASP.NET Core, and not require things like the enormous amount of work that was required to light up Fabulous (which also cannot plug into lots of the Visual Studio-based tooling). I anticipate it being easier to support F# in the future with Xamarin with the team moving their project integration tooling to the same system that .NET SDK-style projects use.

Tooling for consuming source generators written in C# is also something to consider, and this would fall square in the "F# team that does tooling" realm to implement. I expect this to be important as more are available.

praeclarum commented 4 years ago

I will 100% concede that I work on an uncommon platform compared to the rest of the community, but I hope you'll welcome diverse perspectives. Plus, Microsoft states that Source Generators are the recommended solution to the problems linkers present in .NET - a problem Xamarin devs have over a decade of experience with that is now becoming a very real problem in .NET Core.

Instead, build-time source generators will be the recommended mitigation for arbitrary reflection use. -Jan Kotas

I could have listed examples other than Xamarin. Protocol buffers could have been another example. I have been playing with a new version of sqlite-net that uses source generators (though I might end up with an IL masher to make it work with F#). I am also currently working on a library to assist mapping functional structures to object-oriented components (Fom) that could benefit from this technology.

Anyway, thanks again for the consideration!

Tarmil commented 4 years ago

One small thing should be noted: file ordering puts F# at a disadvantage vs C# regarding source generators (or Myriad). If I have a generator that, say, generates serialization code for annotated types, in F# I can't declare a type and use its serialization within the same file, because the generated code would need to be in-between. Whereas in C#, that's not a problem, there's just a cyclic reference between your file and the generated one. Type providers don't have this inconvenient either because they generate code at the right place.

  • ASTs are superior to text No. We program using text editors for a reason. Forcing us to write against these will require a huge investment of time to learn the F# AST. Every programmer knows how to generate a text file. We have large powerful libraries for working with text. While ASTs can save you from a few syntactic errors (the easy part), they don't save you from anything else. It is a whole lot of pain to put on a programmer just for pedantry's sake.

I don't fully agree with your points here, but I still think that generating text is a good idea for a simple reason: it's much easier to cater to both preferences by making a helper library that provides an AST and generates text, than the other way around.

cartermp commented 4 years ago

@Tarmil Yeah, file ordering does limit what you could do with F# in that way. I think that scenario in particular would be confusing to never enable, but without doing something special like treating the file and the generated file as if their constructs were recursively declared, it would be the way things are.

Another thing to consider is what supporting allowing one generator to depend on the output of another would look like. This implies another form of ordering, which I'm not particularly fond of given how top-down ordering is already difficult for beginners to grok.

charlesroddie commented 4 years ago

Interesting point about file ordering @Tarmil .

Type providers have better safety than source generators as you can provide them with the input directly. They don't analyze your entire source (unless you are crazy enough to point them at your .fs files) and you only use the results in the places you specify. They suit F# as a safer, more explicit language.

Enhancements to type provieers mentioned here:

  1. Making them easier to write. (AST helpers? A type provider to generate an AST from a string @praeclarum ?),
  2. Performance in IDEs, including compiling only when needed.
  3. Generate types from types (@kerams). This would deal with some of the cases here, in particular serialization (to replace reflection) including protocol buffers.

How much would remain if this work were done?

charlesroddie commented 4 years ago

Storyboard support and XIB support and XML

Here it's a matter of interop because the economics don't support F#-specific solutions for everything.

Can type providers be used in C#? Imagine we relegate erasing type providers to a historical footnote. Then you can use them by referencing F# projects. Could they be used directly? For example you write some annotation [TypeProvider(TypeProvider,TypeOrStringToAnalyze)] in a C# project. Then a C# source generator looks at it that automatically gets the type provider to generate the type, which gets compiled to IL and referenced in the source generation step.

Can C# source generators be used in F#? Say you have ProtoBuf generator which takes the source for a type as input. Then from F# you have a type, compile it to IL, decompile it to C#, feed it to the source generator, get enlarged source as output, compile it to IL, and reference it. Feasible or too many steps?

I agree with @praeclarum that we need to think hard about how people using these language features can create .Net solutions rather than language-specific solutions.

7sharp9 commented 4 years ago

Im happy Myriad was mentioned here, feel free to add any ideas, improvements, ideas etc to the issues: https://github.com/MoiraeSoftware/myriad

Swoorup commented 4 years ago

One reason, I dislike text based generation is adding complexity of multi-pass compilation and adding to the performance bloat in the compiler. Another reason, I dislike is, F# being a white-space sensitive language, it will probably make incredibly harder to get source generation right. I feel Myriad provide a good base to build features on top of it and suggestion to pass types to TP is the way forward.

7sharp9 commented 4 years ago

When I was building Myriad I did think of removing the quotation aspect of Type Providers and instead have just AST input rather than quotations. I think quotations not quite mapping 1 to 1 over the F# language can be a big limitation with regards to generating source, especially as quotations transform the input into a quoted from and cannot represent types either. Myriad could be called as part of the compile chain as there is an input into the compiler accepting an AST. Currently it can be integrated via MSBuild or by calling it direct with the CLI tool.

Thorium commented 4 years ago

When I was building Myriad I did think of removing the quotation aspect of Type Providers and instead have just AST input rather than quotations. I think quotations not quite mapping 1 to 1 over the F# language can be a big limitation with regards to generating source, especially as quotations transform the input into a quoted from and cannot represent types either. Myriad could be called as part of the compile chain as there is an input into the compiler accepting an AST. Currently it can be integrated via MSBuild or by calling it direct with the CLI tool.

I hope we could deal with existing ASTs instead of creating more and more of them. https://github.com/fsharp/FSharp.Compiler.Service/issues/938

7sharp9 commented 4 years ago

There the typed and untyped last, the typed AST is not user constructible so it only really leaves the untyped one, which also has an entry path into the compiler and fantoms for turning back into F# source.

The typed AST is only really currently useful for transpiling an F# cast to another language as it has no API for modification or construction.

You can convert a quotation back to an AST other the process is not perfects as data is lost in the initial quotation literal process, programmatic quotation construction does not cover the whole F# language either so not ideal.

yatli commented 4 years ago

Now that we're talking about AST... :) I did a lot of C# expression tree (and the lambda syntax sugar) metaprogramming, and really wish the F# counterpart is on par with that. A full-fledged AST, convertible with quotations, can compile and run, would be even more useful than source generators in my opinion:

yatli commented 4 years ago

btw, a lot of projects (MS Bond, protobuf, GraphEngine etc.) already have this code generation workflow by using custom MSBuild tasks. So I don't think the workflow is something new, but how the mechanism generates executable bits (source? AST? etc.) is to be carefully designed.

I wrote both versions of codegen for GraphEngine (it generates millions of lines of code for modeling strongly-typed knowledge graph) -- the first version is done in C# with string concatenation and the coding/debugging experience is horrible.

In the second version I came up with something pretty unique -- it's a meta-template system that generates code generators. I made rules that the meta templates must compile fine themselves, with the "holes" properly annotated. The meta generator then transforms the meta template into generators, which takes user input, and generate source code.

If F# is indeed going to implement the source generators, I wish it is not a CodeDom style API compile: string -> Assembly but more like the "meta generator" :)

7sharp9 commented 4 years ago

@yatli Have a look at Myriad, I recently updated the readme to be a little more descriptive.

yatli commented 4 years ago

@7sharp9 thanks! I surfed through the README.md and also your blog. It takes types as input, and use plugins to generate AST and then translate back to source code, right?

Thorium commented 4 years ago

The typed AST is only really currently useful for transpiling an F# cast to another language as it has no API for modification or construction

Could some kind of typed AST API construction be the right way to continue? (I'm sure this needs to be approved in principal first, not just a new PR.) The untyped AST needs quite lot of work to be used (and is potentially even a bit dangerous: the parser should really understand everything, as F# is not side-effect-free language).

7sharp9 commented 4 years ago

@yatli Technically it takes an AST as input, then creates an AST fragment from it then translates that to source code, which is included in your project. It could take other things as input too, its just the current API is an input file/AST.

7sharp9 commented 4 years ago

@Thorium Myraid could type check the outputted AST if we wanted. Myriad uses fsast to assist with AST construction. It still needs some helpers to improve ux, there are a few I could add from my recent streams.

yatli commented 4 years ago

I have a small challenge for you guys to think about:

Suppose we have a library, in which there's a bunch of functions:

We want to build a codegen that, given user input, call one of these functions. The user input specifies the name of the function, the types, and the arguments. For example, a json user input would look like this:

{
  "func": "f_name1",
  "args": [ 123, "hello" ],
  "result_binding": "foo"
}

It should generate something like this:

open mylib
let foo = f_name1 123 "hello"

Simple, right?

Now the question is, what is the minimal-effort way that the codegen implementer can raise meaningful error messages if the code doesn't compile?

For example:

{
  func: 'f_name1',
  args: [ 'wrong', 'hello' ],
          ^~~~~~~~~~ expect 'int', got 'string'
  result_binding: 'foo'
}

One can wait until error messages come back from fsc and parse it, but it's going to be painful.

7sharp9 commented 4 years ago

In the case of Myriad you do nothing, the error would be shown in the IDE after a build. Myriad runs at precompile only if the input file has changed.

yatli commented 4 years ago

@7sharp9 the error message points to the generated code? Or can it reflect back into the json file?

The user will be interested in "which part of my input has caused the error".

7sharp9 commented 4 years ago

As the output file is just part of your build it will be shown in the source where the error is.

OnurGumus commented 4 years ago

@OnurGumus I don't think we'll end up supporting syntactic macros: #210

@cartermp who is we? I am not aware of any proper discussion of syntactic macros, there is no discussion of syntactic macros except people who wanted it in the linked issue either. Sure everyone of us have different opinions. I have used synactic macros extensively with nemerle, and I was always extremely happy with them. Sure this is my personal experience but dismissing such a potentially nice concept without proper grounds is just sad.

yatli commented 4 years ago

@7sharp9 now try this:

Input:

chicken chicken chicken,
chicken chicken.
chicken chicken: chicken chicken chicken-chicken?
chicken!

Error message: generated.fs, line 6: 'chicken' is not defined

Happypig375 commented 4 years ago

chicken

cartermp commented 4 years ago

@OnurGumus The linked issue on syntactic macros is explicitly tagged as "probably not". This issue is not about macros. Source Generators are not macros largely because they don't allow for rewriting of user code. If you have a compelling argument for macros, I suggest making it on the linked issue, thanks.

@yatli

btw, a lot of projects (MS Bond, protobuf, GraphEngine etc.) already have this code generation workflow by using custom MSBuild tasks

Yes, this is correct. It's typically a post-build step. This is also how Razor gets compiled in ASP.NET, which sort of makes it a "double build". Same with XAML -> BAML -> stuffing it into an assembly. All of these frameworks are sort of forced to operate at the wrong level of abstraction unless they explicitly go the route of reflection, which has its own well-known issues.

In your example about specifying function calls via JSON, with the C# implementation this would end up in the generated source code which would then fail to compile (e.g., trying to pass in a string to something requiring a bool). I imagine we'd do something similar if this feature were to get built.

7sharp9 commented 4 years ago

@yatli The chicken is not relevant as the input file would not be a valid ast and the error would be in your IDE with red squiggles before Myraid was even invoked.

yatli commented 4 years ago

@7sharp9 json is not valid F# ast either, but I do think there's valid chickened F# ast, for which the user doesn't know which chicken generates what chicken, but one of them is wrong. The user will have to then read the generated code and try to figure out how the codegen is built to understand the relationship.

A straightforward approach would be to generate comments that capture spans of the original input so that the link is established, but that requires the codegen author to manually map the constructs and parse them back.

So, also @cartermp, my point is exactly against waiting until the errors show up in the generated code, for that's too late for some checks that should've been done earlier.

OnurGumus commented 4 years ago

@cartermp it is true that source generators and macros are different. But if we had macros we wouldn't need to talk about source generators today. Because the functionality would be there. I am in the camp seeing macros as a superset of source generators and it is the right solution to this problem as well. The only challenge with it is the implementation requires proper compiler hooks.

It also makes the langauge very compact. In nemerle the only built in control keyword is match. Everything else life if, for, while is based on functions and match and they are artificial. It took one day for a developer to add the elvis operator as a feature to language as an external macro. https://github.com/rsdn/nemerle/wiki/Macros-tutorial

7sharp9 commented 4 years ago

@yatli If you have specific idea of what you would want in/out in Myriad then feel fee to add an issue and we can consider it for future iterations.

cartermp commented 4 years ago

@yatli Right, but we wouldn't do that. If you provide input to a system that produces something uncompilable, you'll just see that in the output and I think that's perfectly reasonable. One of the benefits of source generators over macros is that the source generator itself is debuggable by design (not easy in the first version for C# today though), so you could pretty simply diagnose what's wrong and fix it from there. I don't think the F# compiler should be in the business of (perhaps poorly) offering a diagnostics system for arbitrary input formats.

@OnurGumus Please comment on the correct issue, thanks. This suggestion is not about macros.

mrange commented 3 years ago

While I think source generators would be a good addition to F# for reasons listed by @praeclarum one feature that is missing in F# is partial classes and partial methods which AFAIK is how C# source generators most likely will allow users to inject custom behavior into the generated code.

In order to be as useful as C# Source Generators are likely to be I think F# would need some similar way to inject custom behaviour.

Another potential issue with F# over C# is where this generated source would be injected in the compile dependency order?

Perhaps F# needs it's own design of Source Generators to make it useful in F#? Some would argue I think that the answer is type providers.

PS. In order to create more enemies I will admit I am a T4 guy and while F# is my preferred .NET language I always lacked good interop with T4, namely partial and how to inject the generated file in the correct place in the compile dependency order.

PS. And as T4 guy I consider C# Source Generators completely redundant, unnecessary and the wrong approach 😃

Happypig375 commented 3 years ago

@mrange Maybe what F# needs are intrinsic type extensions that can be defined outside of the file containing the original type.

7sharp9 commented 3 years ago

@Happypig375 Maybe intrinsics could be less restrictive and allow extension within the same assembly.

7sharp9 commented 3 years ago

Im not sure if the PR for adding intrinsics to TP's could be modified to allow that...

cartermp commented 3 years ago

Now that the first version of C# Source Generators has shipped, some of the .NET ecosystem can start adopting them, which is great.

My general position now is that we'll wait until the dust settles a bit. Using C# Source Generators to enable a "fast path" for some existing .NET APIs will undoubtedly run into problems that the C# team will have to address, and it feels unlikely that things will be stable until at least .NET 6.

Once things are stable we can probably assess if some form of interop or another mechanism is more appropriate. My preference is to do something interop-related, but I'm not sure exactly how that would get accomplished just yet. At the end of the day all it has to do is emit code as a string (in this case it could be F# code) and we could import that generated code during compilation.

I would expect the mechanism to either use or be related to any mechanism we build for analyzers, similar to how it's done for C#. Having a standard mechanism for these kinds of things feels appropriate: https://github.com/fsharp/fslang-design/issues/508

voronoipotato commented 3 years ago

What's most important to me is that for code gen we use should support F# types. It's my current biggest ouch with type providers. I may be beating a dead horse with this because I've been complaining about it for years with no personal contribution (sorry), but I think if we could snapshot type providers and generate F# types we could get the best of both TP's and Code Gen. Add those intrinsic extensions and you're really cooking with gas.