dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.96k stars 4.03k forks source link

Transform SyntaxTree to Expression<T>? #8064

Open amandeep18feb opened 8 years ago

amandeep18feb commented 8 years ago

Since there is no support for collectible assemblies, it would be nice to have a transform that converts the string code to may be Expression since when you call Compile on an Expression of T, the code is compiled into a DynamicMethod, and those are eligible for garbage collection.

ManishJayaswal commented 8 years ago

@amandeep18feb But this would leave the expression tree hanging there for ever?

glopesdev commented 8 years ago

@ManishJayaswal what do you mean by leaving the expression tree "hanging there forever"? Expression trees as well as DynamicMethod instances are all eligible for garbage collection and should be handled by the GC as usual.

After all the talk about enabling C# scripting, I just don't understand how this use case has managed to fly under the radar for so long...

Expression trees are hands-down the most powerful and balanced runtime code generation tool across the entire .NET stack today. They:

At first look they would be the PERFECT candidate for lightweight C# scripting, except for one huge catch: apart from the built-in C# parser support to generate Expression<TLambda> instances, there is no equally lightweight support for runtime parsing of Expression trees from text.

I mean, it's really weird, because you guys are already doing it somewhere inside Roslyn's compiler, so why not just expose this with a clean API to the outside?

There are numerous efforts being pursued that are similar to this, but it's a gargantuan task besides being a massive reinvention of the wheel. Anyway, currently the two most popular approaches that I know of are:

tmat commented 8 years ago

This is more of a general compiler feature than scripting-specific feature. Moving to Compilers. @jaredpar

tmat commented 8 years ago

@glopesdev If we were to emit scripts into collectible code we'd need a better collectible code gen from the CLR. This feature request is tracked here: https://visualstudio.uservoice.com/forums/121579-visual-studio-2015/suggestions/6120992-support-for-collectible-assemblies

glopesdev commented 8 years ago

@tmat I understand and agree that better collectible code generation is very desirable to really lift .NET into full scripting galore.

However, my point was that code generated by Expression trees is already collectible today because of DynamicMethod (see here). No need to modify anything about the CLR runtime, because this already works.

This for me is what really makes the Expression framework shine, in addition of course to its being extremely lightweight and simple (which in itself is already a HUGE plus).

What I meant about emitting expression trees from the language parser is something that was already envisaged by Microsoft long ago (see this blog post). For some reason it never picked up, even though it was a beautifully perfect mechanism in every respect... my two cents: no one ever bothered to do a proper Expression tree parser.

But maybe it's just me...

tmat commented 8 years ago

@glopesdev I agree emitting expressions into Expression Trees would be beneficial for some scenarios, where the set of C# language features is limited to what ETs can represent today. However, there are many features that can't be represented purely in ETs, e.g. any feature that produces compiler-generated type (e.g. anonymous type, lambda closure, etc.). Hence ETs are not enough to support scripting.

glopesdev commented 8 years ago

@tmat agreed, support for collectible anonymous types and lambda closures would change the landscape of what's possible today with .NET scripting.

I hope this will be supported via a thin-layer around the CLR, like DynamicMethod.

glopesdev commented 8 years ago

@tmat actually, with the new tuple syntax there is not even much need for garbage collectible anonymous types anymore.

Also, I don't really understand why lambda closures require compiler-generated types given that I can define a LambdaExpression using an expression tree.

This covers the two scenarios you pointed out for why we needed something more than ETs. Am I missing something?

Is it really worth it making another hugely bloated framework when you already have something that would go 99% of the way and is already done?

Sorry, but I still don't get it. I get depressed every time I think about this issue because the advantages are so obvious.

glopesdev commented 6 years ago

@gafter @tmat @CyrusNajmabadi @jaredpar any thoughts on the above discussion? I would really appreciate some clarification on what I'm missing here.

CyrusNajmabadi commented 6 years ago

Sorry, i'm not sure i understand the ask.

glopesdev commented 6 years ago

@CyrusNajmabadi sorry for not making it clear. In the above discussion, @tmat suggested that because expression trees (ETs) do not support anonymous types and lambdas, that they are not enough to support scripting.

I replied that actually ETs already support lambdas, and that you could piggyback on top of ValueTuple to design record-like types.

I wanted to understand why ETs are not being leveraged by the compiler team as a C# scripting solution. I was trying to argue that all the necessary runtime infrastructure is already in-place (garbage-collection, etc) and that it works across all platforms because of LINQ compatibility. The only thing we are missing is a lightweight parser of ETs.

I would like to understand why this is not a more desired development target, given that it seems like low-hanging fruit with potential for huge returns.

CyrusNajmabadi commented 6 years ago

I wanted to understand why ETs are not being leveraged by the compiler team as a C# scripting solution

I'm not certain i understand what this actually means. What does it mean to "leverage" an "Expression Tree" as a "scripting solution"?

The only thing we are missing is a lightweight parser of ETs.

What do you mean "parser of ExpressionTrees"? What is your input? What do you want to get as your output?

I would like to understand why this is not a more desired development target, given that it seems like low-hanging fruit with potential for huge returns.

I'm not certain what you're asking for. But the easy answer is presumably: because no one else thinks it's important enough to invest the time and resources in given all the other work out there to do. If you think this is valuable, by all means write up a proposal and create a PR for the feature :)

glopesdev commented 6 years ago

I'm not certain i understand what this actually means. What does it mean to "leverage" an "Expression Tree" as a "scripting solution"?

Since .NET 2.0, the DynamicMethod class has allowed emitting garbage-collectible delegates at runtime. In .NET 3.5, this concept was further elaborated by expression trees (ETs) in the System.Expression namespace. This namespace leveraged DynamicMethod to flexibly generate and manipulate garbage-collectible LINQ queries at runtime. This is a core property of what made LINQ so powerful.

Incidentally, System.Expression also made DynamicMethod and Reflection.Emit dramatically more accessible and usable to general programmers. It turned out that composing an expression tree in code is actually a huge step from having to directly emit individual IL instructions one by one. In the same way, Roslyn could have made ETs even more accessible for scripting and runtime code generation.

How? ExpressionTrees are at their core a kind of syntax tree. This syntax tree exists only at runtime, and can only be assembled from code. You can perfectly well instantiate and put together these trees by writing code. Interestingly, you can also convert any instantiated ExpressionTree to a String, but the converse is not available. There is no way you can convert the text representation of an ET to an actual ET at runtime.

However, this capability has clearly been found to be useful before, given that it has already been implemented since .NET 3.5 by the C# compiler, when resolving variables of type Expression<T>. In this case, the right-hand side of the assignment is parsed and reassembled into the set of IL instructions that will instantiate a semantically correct Expression Tree at runtime.

This is what I mean by "leveraging an Expression Tree as a scripting solution": The answer to this issue already exists at some level, somewhere inside the Roslyn compiler. By exposing at runtime the already implemented ability to convert a string into an expression tree, it would become possible to convert arbitrary strings provided by the user (the script) into runtime generated code.

Is this more clear? I fully admit I may be missing something, which is why I originally asked for further clarification.

I'm not certain what you're asking for. But the easy answer is presumably: because no one else thinks it's important enough to invest the time and resources in given all the other work out there to do. If you think this is valuable, by all means write up a proposal and create a PR for the feature :)

Yes, I could, but I honestly think this would be a huge waste of everyone's time. First, I would need to familiarize myself from scratch with the entirety of Roslyn's code base, which is not a small feat in and of itself.

Second, as I mentioned above, this is already done by the framework somewhere, so the hope was that it would be relatively simple to expose the code that resolves Expression<T> syntax trees into the top level API, or even expose it as a small and lightweight standalone API that could be easily included in application runtimes. I fully understand this is probably a very naive assumption, but one always hopes that modularity of code wins sometimes.

I am really sorry that I have probably been unable to explain once again the importance of this feature. I am sorry that no one else thinks this is valuable (although the popularity of the Dynamic LINQ nuget package seems to argue otherwise), and finally I have to admit I am disappointed that this is hard to pull off by using Roslyn.

I was hoping from day one that Roslyn would be an improvement over Reflection.Emit and DynamicMethod: the long sought-after bridge between CodeDOM and full runtime scripting. I am again saddened to see that this hope could not be further from the truth.

svick commented 6 years ago

@glopesdev

This is what I mean by "leveraging an Expression Tree as a scripting solution": The answer to this issue already exists at some level, somewhere inside the Roslyn compiler. By exposing at runtime the already implemented ability to convert a string into an expression tree, it would become possible to convert arbitrary strings provided by the user (the script) into runtime generated code.

That functionality is already exposed in the form of Roslyn Scripting, e.g.:

await CSharpScript.EvaluateAsync<Expression<Func<int, int>>>("i => 2 * i", options)

Or is that not good enough, because the code is generated to an assembly that is not collectible?

glopesdev commented 6 years ago

Or is that not good enough, because the code is generated to an assembly that is not collectible?

Yes, runtime scripts are expected to be changed and modified throughout the life of the application. Not to mention that full-blown assembly compilation is much slower than Reflection.Emit. Also, in the ideal world this would not require the entire Roslyn stack, but just the lightweight expression tree parser.

Basically, as soon as you translate the script AST into an expression tree, you are done, you don't need the whole assembly generation process. The advantage is that expression trees work literally everywhere, from mobile to .NET core, etc. They have to be a first-class citizen because of LINQ. Roslyn on the other hand has a lot of dependencies that need to be injected into the application somehow.

It is good to see Roslyn trying to bring Eval back, though.

CyrusNajmabadi commented 6 years ago

but just the lightweight expression tree parser.

I'm curious what you mean by this. You'd need more than a parser right? To make an expression tree you need semantics so that you can figure out things like the types of variables, and which methods you're calling, etc. etc.

This would need more than a parser right?

glopesdev commented 6 years ago

@CyrusNajmabadi you are correct, I was not being precise. You do need semantic assignment for building an expression tree. I guess I was being biased by my own pet parser implementation using monadic parser combinators where semantic assignment is done simultaneously with parsing.

I wonder how much of the Roslyn engine would be required for this. I was able to keep very lightweight by using as much of the .NET runtime as possible to carry the assignment, but I don't trust myself enough to have chosen all the right tradeoffs.

In any case, I just wanted to clarify my main point that there is a lot of value in targeting the specific language subset that expression trees are able to represent for fast scripting solutions.

CyrusNajmabadi commented 6 years ago

Also, in the ideal world this would not require the entire Roslyn stack, but just the lightweight expression tree parser.

You do need semantic assignment for building an expression tree.

I wonder how much of the Roslyn engine would be required for this.

Sounds like you'd need the entire compiler layer of Roslyn. :)

CyrusNajmabadi commented 6 years ago

using monadic parser combinators where semantic assignment is done simultaneously with parsing.

I don't see how semantics can be done inline with parsing. For one thing, the semantics depend on things like other files. Meaning you need to have all files parsed first before you could understand the semantics of a single file you were parsing.

CyrusNajmabadi commented 6 years ago

In any case, I just wanted to clarify my main point that there is a lot of value in targeting the specific language subset that expression trees are able to represent for fast scripting solutions.

Sure. But this goes back to:

I'm not certain what you're asking for. But the easy answer is presumably: because no one else thinks it's important enough to invest the time and resources in given all the other work out there to do. If you think this is valuable, by all means write up a proposal and create a PR for the feature :)

You see it as being a lot of value. It's possible that others don't see it as much (at least when compared to all the rest of the work out there that can be bitten off). It's also not clear it's as easy or as cheap as you think it is. If you did want this feature, it might be useful for you to do a simple prototype that demonstrates how these Roslyn innards could be exposed in an appropriate way to solve the problem you have.

glopesdev commented 6 years ago

I don't see how semantics can be done inline with parsing. For one thing, the semantics depend on things like other files.

This is possible if you simply don't let expressions depend on other files. Again, I am talking about scripting expressions here, not about compiling huge projects with (possibly circular) class definitions. When parsing Expression<T> objects the C# compiler does not allow the semantics to leave the immediately available scope.

Of course, local scope itself can become very complex, but monadic parsers can deal with this using dynamic lookahead. Token resolution includes semantic binding as the last step. C# expressions are trees, not graphs, so it is clear this assignment is not only possible, but much simpler.

the easy answer is presumably: because no one else thinks it's important enough

You see it as being a lot of value. It's possible that others don't see it as much.

I have enough clarification regarding this point, thank you for that. Clearly this is too much for both of us to chew on at this point.

It was already clear to me that others do not care, hence my disappointment. However, the fact that I am disappointed does not mean I disrespect your efforts or decisions in the slightest. We will just have to kindly agree to disagree on this particular point and move on.

Thank you so much for the feedback, you should consider closing this issue as out of scope.

CyrusNajmabadi commented 6 years ago

Of course, local scope itself can become very complex, but monadic parsers can deal with this using dynamic lookahead. Token resolution includes semantic binding as the last step. C# expressions are trees, not graphs, so it is clear this assignment is not only possible, but much simpler.

It's not something that exists at all in Roslyn. So it would likely not be simple :)

This ties back into why this likely hasn't been done. You're talking about introducing totally new ways for the Roslyn subsystem to do things, whihc would likely not be easy to to do. I would again recommend some sort of prototype demonstrating what you're looking for, which can also then serve to being able to more accurately determine what is missing and what costs those would entail.

glopesdev commented 6 years ago

My prototype of a monadic parser combinator for C# expression trees is here: https://github.com/glopesdev/expression-script

This follows the Erik Meijer paper with the same name, which can be found here: http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf

It includes parsing, type resolution (including overload resolution) and semantic binding up to lambda closures. You can find some examples in the playground branch.

I don't like this prototype because it is not finished. I was using it to learn about monadic parsing and dynamic lookahead parsers with context, which I think would work well for this task. I fully recognize I am no more than an amateurs in parser and compiler design, which is why I don't think I can provide any kind of more formalized proposal or PR for this.

However, I have experimented with my own efforts (and those of others) enough to know that this is possible and very worthwhile. Of course, I don't have the complete formal proof ready to submit that would convince you in any way, so unfortunately this is as far as I can go right now. I hope someone in the future will be willing and able to go the extra mile.

CyrusNajmabadi commented 6 years ago

I'm not looking for a proof/complete proposal :) just a proof of concept using roslyn to demonstrate what would be easy, and what would need major work in order to finish.

GalenLee commented 6 years ago

glopesdev, I found this thread by searching specifically for “convert Syntax Tree to Expression Tree”. I would love to be able to create executable code at run time and have it be collected. I was excited to see Rosyln’s support for scripting, until I found out the code is not collected. I could see this being very helpful in rules/calculation engines (rules/formulas could be stored in a DB) and parsing files (custom parsing logic).

NMSAzulX commented 10 months ago

I'm lucky to see this post, although it's been a long time, this idea is not outdated. This idea is as bold as what I do in my bathroom. I think I can describe a balanced scenario. I have been using dynamic compilation for several years. The thing that bothers me the most is that they think compilation.Emit() is very expensive and not lightweight enough. Some people generate thousands of dynamic assembly. They still can't tolerate the memory and delay consumption impact brought by compilation.Emit(), although I'm still advising them to use compilation.Emit().

When we use Expression Trees for some simple operations, there is a way to allow passing in C# scripts and generating methods. While our requirement is too complex, we need to use Compilation to compile the code into an assembly.

expression:

Perhaps this Expression Tree can only generate a dynamic method with simple logic, and its semantics are limited. For example, it only supports Object / Math(F) / PrimitiveType / string / DateTime / Vector128...?

case1:

var func = _csharpCompilation
.Expression
.WithParam("int arg1, char arg2")
.WithParam<CustomType>("arg3")
.Expression<int>("return arg1+arg2+arg3.Age")
.CreateDelegate()

case2:

var func = _csharpCompilation
.Expression
.WithParam("int arg1, char arg2, double arg3")
.Expression<double>("return  Math.Acos((double)arg1)+(double)arg2+arg3;")
.CreateDelegate()

complex script:

var  compilation = CSharpCompilation.Create(AssemblyName, SyntaxTrees, references, options);
compilation.Emit(
           dllStream,
           pdbStream: pdbStream);
tomaszmalik commented 8 months ago

I have used DynamicExpresso, but it didn't support a specific dynamic case that I needed. I really liked the idea of using Roslyn's SyntaxTree and converting it into a LINQ Expression. So, a while ago, I created my own parser for another project. Now, I have extracted this solution into a public repository (MIT). Maybe someone will find it useful. :) I will complete the documentation in a free moment.

Repository: https://github.com/TagBites/TagBites.Expressions

Test scenarios: https://github.com/TagBites/TagBites.Expressions/blob/master/tests/TagBites.Expressions.Tests/ExpressionParserTests.cs