dotnet / csharplang

The official repo for the design of the C# programming language
11.45k stars 1.02k forks source link

[Proposal]: Interceptors #7009

Open RikkiGibson opened 1 year ago

RikkiGibson commented 1 year ago

Interceptors

Feature specification has been moved to https://github.com/dotnet/roslyn/blob/main/docs/features/interceptors.md. As this is not currently implemented as a language feature, we will not be tracking specific status pieces on this issue.

Design meetings

CyrusNajmabadi commented 1 year ago

Can you spec out the paths, esp vis a vis unix vs windows paths? Specifically, this seems problematic:

File paths used in [InterceptsLocation] must exactly match the paths on the syntax nodes by ordinal comparison

Unless the compiler is already normalizing these paths across platform.

CyrusNajmabadi commented 1 year ago

public static void InterceptorMethod(this C c, int param)

I worry we're doing this at the wrong time. This only seems to be for methods, when we might want it for properties, etc. should this wait for 'extensions' to be done first?

CyrusNajmabadi commented 1 year ago

Constructors are a bit of a sore point for the Regex scenario. It would be nice to be able to intercept new Regex("..."), but it would be strange for an object-creation expression to return an instance of a subtype, for example, even if the expression's static type is unchanged.

I don't see this as any morally wrong versus calling one method and another being called. :)

CyrusNajmabadi commented 1 year ago

takes a qualified method name. For example, for the method bool System.Text.RegularExpressions.Regex.IsMatch(string), we would use the System.Text.RegularExpressions.Regex.IsMatch.

Would prefer that this break up the type-name and member-name into separate pieces.

HaloFour commented 1 year ago

There's a lot to digest here but I'm trying to reconcile how this proposal compares to the AOP scenarios that I often face. At first glance this seems really limited, especially if only one interception can happen at a time. It also seems that the interception cannot "wrap" the target, at least not unless the target is itself accessible from the interceptor. There's no access to the underlying joinpoint, which would make aspects like logging, tracing, transactions, bulkheading, etc. possible. This seems to be very specialized around the specialization of framework code.

I have other concerns about how this is very much spooky action at a distance, as in looking at a PR you can no longer tell if two identical method invocations back to back actually call the same method. There's also the fragility that if the generator doesn't run the interception becomes broken.

yaakov-h commented 1 year ago

To me this feels extremely limited and niche, and at the same time introduces a lot of moving parts that need to be kept in sync.

There also doesn't seem to be a way to opt in or out of it for different invocations, the only construct I can think of would be comments like the /* lang=regex */ syntax highlighting comments, which could be hard to read and understand with sequential invocations like builder objects.

The obvious question I have though is how would this work with default parameters? Would the interceptor need to know the default value and mirror it, or would that be passed through even though the caller did not specify it?

alrz commented 1 year ago

If I read this correctly I think this is basically replace/original on the call-site. There's https://github.com/dotnet/csharplang/issues/3992 to help with that but this doesn't seem to make it easier?

It seems possible to implement a version of replace/original using interceptors

You can already do this (sort of):

[AddLogging]
public partial void Method();
private void Method_Impl() { .. }

I think it comes down to a not-so-hacky way to do either of those, interceptors is definitely an option but still kind of hacky IMO.

The first case is like an overload sensitive to constant arguments. That's pure/constexpr which can also interact with generators:

 // could evaluate to 3 at compile-time, provided all arguments are constant and `Add` is pure.
var result = Add(1, 2);
// could evaluate the argument at compile-time provided it's a constant, using source generators.
var result = Evaluate(expressoin: "1+2"); 
var result = IsMatch(str, pattern: "\d+"); 

// generate a specialized IsMatch for each constant pattern
partial bool IsMatch(string, const string pattern)

The second case is kind of like a horizontal new:

extension for C {
  public new void Method() { .. } // preferred over the original C.Method
}

I've filed https://github.com/dotnet/roslyn/issues/20631 for this a while back - right now it can be silently ignored.

orthoxerox commented 1 year ago

The biggest drawback is probably spooky action at a distance. I understand that by marking my method interceptable I acquiesce to having my logic replaced, but now the user of my library Foo can reference another library Bar that silently generates interceptors for my method that do whatever they want.

TheUnlocked commented 1 year ago

We still think it's important that this works by adding source code to the compilation, and not by exposing source generator APIs that modify the object model of the compilation in any way. It is a goal that you can build a project with source generators, write all the generator outputs to disk, delete the generators, and then build again and have the same behavior.

I don't necessarily agree that the first sentence is the best way to achieve the goal in the second sentence. If there was an API to make direct edits, I would imagine that the compiler could still generate some kind of serialized patch containing those edits which it could save to disk and read later. While that would be pretty fragile in the event of source changes, it wouldn't be any more fragile than referencing a character offset, and it might avoid needing to re-parse/bind everything in the (probably more typical) case when you aren't saving/reading the modifications to/from disk.

HaloFour commented 1 year ago

The more I think about this proposal the less I think it addresses more general-purpose AOP scenarios. I use AOP heavily in my Java services. We routinely use aspects for cross cutting concerns. Those concerns include tracing, logging, metrics, bulk heading, circuit breaking, concurrent call rates, retries (with exponential backoff), memoization, transactions and declarative security. I've also written more specific aspects to facilitate instrumentation of third-party libraries. This interceptor solution may cover some of those cases, but even then it seems it would be a non-ideal way to accomplish it. It doesn't handle other commonly requested scenarios at all, such as IPNC, since the interception happens at the call-site. There's nothing wrong with call-site AOP, but it's only one flavor and can only handle certain use cases. Worse, this proposal explicitly specifies that multiple interceptors aren't legal, but that's an incredibly common case in my code.

I honestly don't see what use cases this proposal is intended to satisfy, except the extraordinarily narrow case of allowing ASP.NET to replace user-written code with their own.

It's also been suggested that this is an appropriate route to implement optimization, where runtime-provided source generators will rewrite some percentage of the user-written code with more runtime-optimized (but certainly less readable) versions. That feels 100% backwards to me, and still very narrow. Not to mention, such "optimizations" can only ever benefit people using newer versions of C# with support for these intercepting generators. Every other language is left out.

Coming from languages where post-processing and byte-code rewriting are a normal thing, this is extremely bizarre to suggest as a language enhancement.

teo-tsirpanis commented 1 year ago

How would [InterceptsLocation] accepting strings as file names interact with dotnet/roslyn#57239 when/if it is implemented? Wouldn't it remove the guarantee that file names are unique?

RikkiGibson commented 1 year ago

Thank you for the feedback so far.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1446903296

Can you spec out the paths, esp vis a vis unix vs windows paths?

Note that the compiler doesn't require paths to have any particular form. They just have to be representable in a .NET string.

You can really think of this as: imagine the method we are calling has a CallerFilePath parameter. What string would we pass there? The string we put in the InterceptsLocationAttribute needs to exactly match that string in order to intercept that call. If the implicit CallerFilePath string argument differs based on which environment we are compiling in, then the argument to InterceptsLocation needs to change in the exact same way.

This is very problematic for any manually-written InterceptsLocationAttribute--it's automatically non-portable, unless very specific pathmap arguments are used--for example, have no idea if the right thing will just start happening if you pathmap \ to /. Maybe, maybe not. It starts to feel like a huge hack to make it work at all.

I worry we're doing this at the wrong time. This only seems to be for methods, when we might want it for properties, etc. should this wait for 'extensions' to be done first?

Great question. It does feel like if we wanted to do this, it should be possible to do it with pretty much any non-type member, unless we had a specific reason for leaving them out. This tends to raise the question about how to denote the "location" of a usage of any kind of member, in a way that is not ambiguous with the usage of any other member. For example, use of implicit conversion operators might be tough to intercept based on location.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1446941089 I wasn't sure what you meant by underlying joinpoint.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1448998072 I wasn't sure what IPNC means.

That feels 100% backwards to me, and still very narrow. Not to mention, such "optimizations" can only ever benefit people using newer versions of C# with support for these intercepting generators. Every other language is left out.

I definitely want us to consider if the use case of the platform specializing calls to platform methods within a project is better served by an post-compilation IL rewriting step. That will require revisiting any reasons we've been reluctant to introduce such a step in the past.

It's clear that disallowing composition of multiple interceptors is really limiting here. As we look a little more broadly at this space I would definitely like for us to consider what it would mean to allow composition. The sequencing definitely seems easiest to define when you also control the original definition of the thing whose calls will get replaced.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1447340973

There also doesn't seem to be a way to opt in or out of it for different invocations

I think this would require the "syntactic marker at interception site".

The obvious question I have though is how would this work with default parameters? Would the interceptor need to know the default value and mirror it, or would that be passed through even though the caller did not specify it?

The interceptor would not need to know or mirror the default value, but it would need to have a parameter corresponding to that optional parameter--just as we disallow implicitly converting the method group of void M(string x, string y = "a") { } to Action<string>, for example. In fact, it might be bad for interceptors to use default values--since we don't want to bind the original call site any differently, we're likely to want to spec this to not even insert any default values which are specified by the interceptor.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1454902160

Wouldn't it remove the guarantee that file names are unique?

Although it's hard to have duplicate file names in a real world project, there's no guarantee of unique paths in the compiler itself. (This was a significant complicating factor for file-local types.) Most likely, if a single tuple of (filePath, line, column) does not refer to exactly one source location in the compilation, it needs to be an error to try and intercept the location. This could be problematic for intercepting calls in generated code.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1448046520

The biggest drawback is probably spooky action at a distance. I understand that by marking my method interceptable I acquiesce to having my logic replaced, but now the user of my library Foo can reference another library Bar that silently generates interceptors for my method that do whatever they want.

Yeah, we would potentially want something like "Key tokens" listed under "Alternatives" to reduce the risk of this.

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1447830935

The second case is kind of like a horizontal new

I think this is a great point. I would like to try looking at the problem from a different angle, where we consider if some controlled facility could be introduced to make certain extensions better matches than instance methods in some cases. Could this give us some of the benefits of replace/original while still being able to work across projects--that is, where the replacement isn't in the same project as the original.

I don't yet know in which cases this would be or what drawbacks this would have. There could be some fairly spooky problems where the replacement method wouldn't actually get used in places where we want it to.

What I am seeing is that some scenarios like ASP.NET minimal API or query providers want to dispatch based on location essentially because some information about lambda expressions is lost after compilation or requires reflection or complex traversal to access at runtime. But other places want to dispatch based on:

Basically, what if a specialized implementation of a method could be added to a project, which dispatches however the implementer wants, simply by using the flow-control constructs of the language, rather than sticking declarative references to source locations into attributes on the signature.

Consider the following extremely strawman sample, which hopefully gets the basic idea across.

IRouteBuilder builder = ...;
// optionally: "conversion" to an explicit-extension type? where the framework exposes some path for doing this.
SpecializedRouteBuilder builder1 = builder.WithOpportunisticSpecialization();
builder1.MapGet("/products", (request, response) => ...);
builder1.MapGet("/items", (request, response) => ...);
builder1.MapGet("/config", (request, response) => ...);

// WebApp.generated.cs
// make it an optimization problem?
explicit extension SpecializedRouteBuilder for IRouteBuilder
{
    // add an extension which deliberately wins over the one actually present in the type?
    public shadowing IRouteBuilder MapGet(string route, Delegate handler,
        [CallerFilePath] string filePath = null,
        [CallerLineNumber] int lineNumber = -1,
        [CallerColumnNumber] int columnNumber = -1)
    {
        switch (filePath, lineNumber, columnNumber)
        {
            // assume Native AOT can inline and constant-propagate to ensure
            // this dispatch doesn't actually need to happen at runtime in most cases.
            case ("file.cs", 1, 1): Method1(route, handler); return this;
            case ("file.cs", 2, 2): Method2(route, handler); return this;
            case ("file.cs", 3, 3): Method3(route, handler); return this;
            case ("file.cs", 4, 4): Method4(route, handler); return this;
            case ("file.cs", 5, 5): Method5(route, handler); return this;
            case ("file.cs", 6, 6): Method6(route, handler); return this;
            default: base.MapGet(route, handler);
        }
    }
}

// Regex.generated.cs
explicit extension MyRegexFactory for Regex
{
    // add an extension which deliberately wins over the one actually present in the type?
    public static shadowing Regex IsMatch(string pattern)
    {
        switch (pattern)
        {
            case ("a+"): return APlusMatcher();
            case ("ab+"): return ABPlusMatcher();
            default: return base.IsMatch(pattern);
        }
    }
}
vladd commented 1 year ago

@RikkiGibson

I wasn't sure what IPNC means.

I think it’s a common desire to decorate a property in some way and get the boilerplate for implementing INotifyPropertyChange automagically applied. This scenario is not supported by current source generators.

alrz commented 1 year ago

There could be some fairly spooky problems where the replacement method wouldn't actually get used in places where we want it to.

If I understood the extension proposal correctly, an implicit extension always applies? a la https://github.com/dotnet/csharplang/issues/4029

What I am seeing is that some scenarios like ASP.NET minimal API or query providers want to dispatch based on location

I don't think merely the "location" is the defining factor here (just like the IsMatch case), I would expect any c.InterceptableMethod(1) dispatches to the same generated code if the argument is the same.

Of course, this isn't workable for typeof, lambdas or generics (basically anything that doesn't have a const representation).

For generics, it could be a case for specialization:

extension for Serializer {
    // `new` makes sure that this is specialization/shadowing so the signature must match.
    public static new string Serialize</*const*/ MyType>() { .. } 
}

Which can be enabled for constants as well:

extension for Regex {
    public static new bool IsMatch(string str, const string pattern = "abc") { .. }
}

These would result in a separate method and eliminate the need for a switch.

None of that would work for lambdas but I think it should remain an implementation detail, that is, to use some auxiliary APIs to help with generating the switch and getting the "concrete" value (typeof, lambda body, etc) for the generator to use directly.

HaloFour commented 1 year ago

@RikkiGibson

I definitely want us to consider if the use case of the platform specializing calls to platform methods within a project is better served by an post-compilation IL rewriting step. That will require revisiting any reasons we've been reluctant to introduce such a step in the past.

It was mentioned on Discord that this is because IL-rewriting is challenging and the tools aren't up to snuff, and I don't doubt it. Regardless of these use cases I think there's a good opportunity for the .NET teams to come together and provide such tooling, that can emit appropriate IL and debugging symbols. For the most part, writing transformers and aspects In the Java world is a pretty simple experience. It's extremely rare to emit raw byte code, you typically write normal Java classes to describe the advice and use either annotations or a fluent DSL to describe where that advice will be applied. Tooling in IDEs like IntelliJ will show you where the advice will be applied. Transformation can then be performed as a post-compilation step, prior to a native/AOT build. It's pretty slick and easy to use once you grok the concepts of AOP and interception. I would imagine an official solution for .NET, with tooling support, could very easily rival it, too.

RikkiGibson commented 1 year ago

https://github.com/dotnet/csharplang/issues/7009#issuecomment-1461603069

If I understood the extension proposal correctly, an implicit extension always applies?

I think if the receiver of the member access is the extension type, then members on the extension type are a better match than members on the underlying type. But if the receiver is the underlying type of an implicit extension, then the implicit extension members are only considered after we try and fail to find an applicable member on the underlying type. e.g.

var mt = new MyType();
mt.M(); // MyType.M();
mt.M1(); // Ex.M();

Ex ex = mt;
ex.M(); // Ex.M();

public class MyType
{
    public void M() { }
}

implicit extension Ex for MyType
{
    public void M() { }
    public void M1() { }
}
alrz commented 1 year ago

The biggest drawback is probably spooky action at a distance. I understand that by marking my method interceptable I acquiesce to having my logic replaced, but now the user of my library Foo can reference another library Bar that silently generates interceptors for my method that do whatever they want.

I think to account for that it could be required to add specializations/interceptors for a particular member only in a single extension declaration in addition to a marker on the original member.

class C {
  public replaceable void Method() { .. } // permits replacement by a generator
}
extension for C {
  public replace void Method() { .. } // but if this already exists, no one else can replace it
}

That's assuming an external replacement could be unambiguous to begin with.

jnm2 commented 1 year ago

Please please please call this ComeFromAttribute. The opportunity is too perfect! I'm only half kidding.

RikkiGibson commented 1 year ago

I think to account for that it could be required to add specializations/interceptors for a particular member only in a single extension declaration in addition to a marker on the original member.

This is quite a bit like what is described in #7061, except that we were considering only letting extensions get shadowed/replaced by extensions. Instead of a replaceable or similar marker, the signal to say "let this get replaced" is just to declare it as an extension.

aetos382 commented 1 year ago

Is it considered that the interceptor receives a reference to the original method through the Action or Func?

class Original
{
    public void Caller()
    {
        Console.WriteLine(this.Interceptable1(1)); // prints '6'
    }

    [Interceptable]
    public int Interceptable1(int x)
    {
        return x + 1;
    }
}

class Interceptor
{
    [InterceptsLocation(...)]
    public int Intercepter1(int x, Func<int, int> original)
    {
        return original(x + 1) * 2;
    }
}
gfraiteur commented 1 year ago

This feature seems to address a very specific problem with a very specific solution. While this may be effective in the cases at hand, it may also be problematic in the long run, as the C# compiler is expected to last many more decades.

Instead of creating a highly specialized solution at the compiler level, it may be worth considering a more general and flexible approach based on aspect-oriented programming (AOP). AOP has been used successfully in various domains for over 20 years and offers many benefits for modularizing cross-cutting concerns.

At PostSharp Technologies, we have been developing Metalama, an AOP framework based on Roslyn. Metalama still does not support call-site aspects (i.e., interceptors), but we have plans to implement them in the future. We believe that Metalama could meet the requirements of this feature proposal and provide a more elegant and robust solution.

We would be happy to collaborate with you on this matter and share our expertise and insights.

ufcpp commented 1 year ago

Does this work safely even if the code-cleanup automatically replaces spaces with tabs or vice versa?

CyrusNajmabadi commented 1 year ago

@ufcpp yes :-)

Bonuspunkt commented 1 year ago

what did i miss?

using System;
using System.Runtime.CompilerServices;

var c = new C();
c.InterceptableMethod(1); // prints `interceptor 1`
c.InterceptableMethod(1); // prints `other interceptor 1`
c.InterceptableMethod(1); // prints `interceptable 1`

class C
{
    public static Action<int> Interceptable { get;set; } = (param) => Console.WriteLine($"interceptable {param}");

    public Action<int> InterceptableMethod { get; set; } = Interceptable;
}

static class D
{
    [ModuleInitializer]
    public static void Init() {

        Func<Action<int>> gen = () => {
            var originalMethod = C.Interceptable;

            var actions = new Action<int>[] {
                param => Console.WriteLine($"interceptor {param}"),
                param => Console.WriteLine($"other interceptor {param}"),
                originalMethod
            };
            var index = 0;

            return (param) => {
                actions[index](param);
                if (index < actions.Length - 1) index++;
            };
        };

        C.Interceptable = gen();
    }
}
LazyAnnoyingStupidIdiot commented 1 year ago

This feature seems to address a very specific problem with a very specific solution. While this may be effective in the cases at hand, it may also be problematic in the long run, as the C# compiler is expected to last many more decades.

Instead of creating a highly specialized solution at the compiler level, it may be worth considering a more general and flexible approach based on aspect-oriented programming (AOP). AOP has been used successfully in various domains for over 20 years and offers many benefits for modularizing cross-cutting concerns.

At PostSharp Technologies, we have been developing Metalama, an AOP framework based on Roslyn. Metalama still does not support call-site aspects (i.e., interceptors), but we have plans to implement them in the future. We believe that Metalama could meet the requirements of this feature proposal and provide a more elegant and robust solution.

We would be happy to collaborate with you on this matter and share our expertise and insights.

And is PostSharp Technology going to put the AOP features into C# as a core feature set? Or should your link have went to the pricing page?

https://www.postsharp.net/metalama/pricing

Until that happens, it is not a solution worth mentioning here on the Core DotNet issue trackers. What's next? We should be paying some third party vendors for basic language features like generics / expression tree / etc?

Or is it dotnet team's position to have gaps in the language feature set so third party vendors can profit off it?

Be sure to let us all know either way.

This is a specific solution to a specific problem many of us here face, and having this implemented will solve many of our specific problems.

HaloFour commented 1 year ago

@LazyAnnoyingStupidIdiot

And is PostSharp Technology going to put the AOP features into C# as a core feature set? Or should your link have went to the pricing page?

I believe the point here is that a proper AOP solution will cover not only this use case but many, many others that interceptors cannot handle at all. Obviously any solution that ends up built into the language/compiler will not require developers to license/purchase third-party components.

mgravell commented 1 year ago

One potentially simplifying alternative here might be to disallow type parameters on interceptor methods, and force them to specialize based on the constructed signature of the interceptable method.

I'm not sure that's possible; to modify the example immediately preceding this statement:

class C<T>
{
    private T someField;
    public static void Use(...)
    {
        var x1 = something./*L1*/Interceptable("a", someField);
        var x2 = something./*L2*/Interceptable("a", someField);
    }
}

An interceptor cannot magic away the fact that there's an unresolved <T> in the constructed signature, unless the InterceptsLocation additionally allows types to be specified, and would also require JIT interception and per-type JIT (even for ref-type T). There's also the issue that if C<T> is public in library B, the generator here needs to run against B since it is B's calls that are being intercepted, but the final types aren't known until some consuming library/application A defines what C<ConcreteFoo> it actually wants.

mgravell commented 1 year ago

One additional thought - "unpronounceable types"; this would be an enticing API for libraries like Dapper to embrace AOT, but a typical Dapper usage today might look like:

connection.Execute("some sql", new { Foo = 12, Bar = "abc" });

it is not possible to convey this in an interceptor because the type with the parameters is unpronounceable; even if we took object or T as the arg, we can't cast it to get the values out, for similar reasons. There are some ugly workarounds, but... nothing good.

Obviously one alternative would be for us to push an analyzer that replaces the above with value-tuple syntax, i.e.

connection.Execute("some sql", (Foo: 12, Bar: "abc"));

which would allow us to write an interceptor:

[InterceptsLocation(...)]
static int GeneratedExecute(this IDbConnection connection, string sql, (int Foo, string Bar) args)
{       // don't over-analyze this code - just for illustration
        using var cmd = conn.CreateCommand();
        cmd.CommandType = CommandType.Text;
        cmd.CommandText = "some sql"; // not quite always exactly the same as original input

        var p = cmd.CreateParameter();
        p.ParameterName = "Foo";
        p.DbType = DbType.Int32;
        p.Value = args.Foo;
        p.Direction = ParameterDirection.Input;
        cmd.Parameters.Add(p);

        p = cmd.CreateParameter();
        p.ParameterName = "Bar";
        p.DbType = DbType.String;
        p.Value = args.Bar ?? (object)DBNull.Value;
        p.Direction = ParameterDirection.Input;
        cmd.Parameters.Add(p);

        return cmd.ExecuteNonQuery();
}

However, this then gets into a super confusing situation, because if someone uses value-tuple syntax in a scenario that can't use generators for some reason, then nothing works - because value-tuples cannot convey the necessary name information inwards (only outwards).

I wonder, then, if there's some missing feature related to interceptors that makes unpronounceable types: pronounceable. I'm not suggesting such types should be pronounceable between modules/assemblies, but... here me out:

public static int Execute(this IDbConnection conn, string sql, { int Foo, string Bar } args)
{
    // everything else as above
}

what if "named value-tuple type declaration syntax", but using { } instead of ( ) worked for anonymous types? I appreciate this is a bit tangential, but IMO it interweaves the two.

Oh, one other subtle difference to consider between named value-tuples and anon-types (for why we can't just use value-tuples):

var good = new { One = 1 }; // presumably maps to imaginary structural type syntax `{ int One }`
var bad = (One: 1); // CS8124 Tuple must contain at least two elements.
mgravell commented 1 year ago

Just for completeness, here's one of the evil hacks, just to be honest and say "yes, we can get around this":

    connection.Execute("some sql", new { Foo = 12, Bar = "abc" });

    [InterceptsLocation(...)]
    static int GeneratedExecute(this IDbConnection conn, string sql, object args)
    {
        return TypeHack(conn, sql, args,
            static () => new { Foo = default(int), Bar = default(string) },
            static obj => (obj.Foo, obj.Bar));
        static int TypeHack<T>(IDbConnection conn, string sql, object rawArgs,
            Func<T> unused, Func<T, (int Foo, string Bar)> argsFactory)
        {
            var args = argsFactory((T)rawArgs);
            using var cmd = conn.CreateCommand();
            cmd.CommandType = CommandType.Text;
            cmd.CommandText = "some sql";

            var p = cmd.CreateParameter();
            p.ParameterName = "Foo";
            p.DbType = DbType.Int32;
            p.Value = args.Foo;
            p.Direction = ParameterDirection.Input;
            cmd.Parameters.Add(p);

            p = cmd.CreateParameter();
            p.ParameterName = "Bar";
            p.DbType = DbType.String;
            p.Value = args.Bar ?? (object)DBNull.Value;
            p.Direction = ParameterDirection.Input;
            cmd.Parameters.Add(p);

            return cmd.ExecuteNonQuery();
        }
    }
mgravell commented 1 year ago

I've been playing with this locally, and the more I do: the more I'm concerned by the security implications of arbitrary interception. Especially when aspnetcore is looking at generators/interceptors, meaning any global flag would probably be on in a good number of cases.

It seems to me that there are two categories of interception:

Right now, the "ad-hoc" is the implemented case, but it seems like the window for abuse (by injecting malicious generators into existing libraries as a supply-chain attack) can be kept minimal by making only advertised [Interceptable] locations interceptable by default. In reality, I expect most common interceptor cases to be fairly predictable at the library level over which methods need to be intercepted, with the same team writing the library and the interceptor (so: able to add attributes).

If there is a good use-case for ad-hoc interception, IMO that should be a separate flag with a suitable name citing "SomethingDangerousAdhocInterceptorsSomething" (naming is hard). Today a malicious generator could emot a module initializer or static constructor in a private class to do some things, but it can't change the flow of existing code, log per-call arguments, etc.

DamianEdwards commented 1 year ago

Any particular reason one should consider interceptors as any more of a potential security concern vs. other approaches that are already possible today, e.g. IL-weaving, profiler interfaces, DiagnosticListener, IHostingStartup, etc.?

mgravell commented 1 year ago

Maybe I'm over-worrying, but with the exception of IL weaving (which is hard to do by accident), the others don't allow you to subvert existing code to the same degree. But: if I'm wrong and the attack vector isn't interesting in that it doesn't add anything new (because once your build chain is even slightly dodgy, you're completely hosed), then: fine

333fred commented 1 year ago

@mgravell I would say this doesn't necessarily add anything interesting. The compiler already allows arbitrary code execution during the compilation process by analyzers and source generators. While we don't expose APIs for them to insert themselves into the build pipeline, the reality is that this is all .NET, and they could easily be evil. Similarly, NuGet packages can bring in arbitrary MSBuild props and target files, which can also do plenty of evil things.

CyrusNajmabadi commented 1 year ago

ultimately, I guess; how do you code-review obj.DoesMethod(12, "abc"); if that line can be swapped silently at compile-time?

How do you review a call that might already bind to an extension method that an SG might insert into your compilation that is in a closer namespace than the original namespace you thought you were binding to?

gfraiteur commented 1 year ago

Right now, the "ad-hoc" is the implemented case, but it seems like the window for abuse (by injecting malicious generators into existing libraries as a supply-chain attack) can be kept minimal by making only advertised [Interceptable] locations interceptable by default. In reality, I expect most common interceptor cases to be fairly predictable at the library level over which methods need to be intercepted, with the same team writing the library and the interceptor (so: able to add attributes).

In our experience with AOP/PostSharp/Metalama, what is important for developers is predictability, i.e. least surprise principle. When you add a behavior to a declaration, and if it is not visible in the source code as, for instance, a custom attribute, it will come as a surprise.

To make interceptors predictable, we make them visible using IDE editor features. In Metalama, we primarily use CodeLens on the declaration. In PostSharp, we use a combination of adornments (underlining) and Quick Info on the declaration. In case of call-site interceptors (as opposed to declaration-site aspects), the only possible IDE option are call-site adornments e.g. underlining the method name in the calling method body.

We added two more visibility features in Metalama:

The degree to which IDE support for visibility and predictability is important depends on what interceptors/aspects are used for i.e. how much side effect they introduce:

mgravell commented 1 year ago

OK, so I have a data-point to add to the feedback; I took Dapper, added some interceptor support (very incomplete, proof-of-concept), and ran it through a basic test on Adventureworks (publishing via the IDE in RoslynDev mode on the branch)

Feedback on feature:

But: numbers, because I want to emphasize "wow, this is great", and give you ammunition for the "should we keep pushing on this":

baseline - release mode, regular Dapper with no voodoo, win-x64:

➜ .\UsageLinker.exe
First time: 106ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 286ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:29:25  win-x64  interceptors  ?1 ~4 -1  ﮫ 706ms 
➜ .\UsageLinker.exe
First time: 98ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 279ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:29:28  win-x64  interceptors  ?1 ~4 -1  ﮫ 645ms 
➜ .\UsageLinker.exe
First time: 95ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 272ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:29:29  win-x64  interceptors  ?1 ~4 -1  ﮫ 636ms 
➜ .\UsageLinker.exe
First time: 94ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 276ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:29:31  win-x64  interceptors  ?1 ~4 -1  ﮫ 636ms 
➜ .\UsageLinker.exe
First time: 93ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 273ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:29:33  win-x64  interceptors  ?1 ~4 -1  ﮫ 632ms 

test scenario 1 - exactly the same as baseline, but enable the DapperAOT build tools (zero other changes)

➜ .\UsageLinker.exe
First time: 37ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 281ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:27:17  win-x64  interceptors  ?1 ~4 -1  ﮫ 629ms 
➜ .\UsageLinker.exe
First time: 37ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 273ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:27:19  win-x64  interceptors  ?1 ~4 -1  ﮫ 575ms 
➜ .\UsageLinker.exe
First time: 37ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 272ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:27:22  win-x64  interceptors  ?1 ~4 -1  ﮫ 571ms 
➜ .\UsageLinker.exe
First time: 37ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 265ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:27:24  win-x64  interceptors  ?1 ~4 -1  ﮫ 559ms 
➜ .\UsageLinker.exe
First time: 38ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 283ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:27:26  win-x64  interceptors  ?1 ~4 -1  ﮫ 585ms 

test scenario 2 - like test scenario 1, but publishing in full AOT/trimmed mode:

➜ .\UsageLinker.exe
First time: 0ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 229ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:32:17  publish  interceptors  ?1 ~3 -1  ﮫ 371ms 
➜ .\UsageLinker.exe
First time: 0ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 230ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:32:25  publish  interceptors  ?1 ~3 -1  ﮫ 300ms 
➜ .\UsageLinker.exe
First time: 0ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 223ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:32:28  publish  interceptors  ?1 ~3 -1  ﮫ 294ms 
➜ .\UsageLinker.exe
First time: 0ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 231ms
752: Road-150 Red, 52 (BK-R93R-52)
   14:32:29  publish  interceptors  ?1 ~3 -1  ﮫ 303ms 
➜ .\UsageLinker.exe
First time: 0ms
752: Road-150 Red, 52 (BK-R93R-52)
Next 2000 times: 222ms
752: Road-150 Red, 52 (BK-R93R-52)

So:

So: I think that's a pretty good success story. Feel free to grab me if you want more details, or to play with it (this is all on public GitHub, although I emphasize: not many scenarios work at the moment; I've just targeted one that does work)

mgravell commented 1 year ago

I would like to propose a little bit of... flexibility in the signature matching; right now, the signatures need to be a very direct match. I would, however, like to be able to intercept away boxing and/or casting when those things are redundant. Consider, for example (and I appreciate that the example is very artificial):

using System.Runtime.CompilerServices;

Foo foo = new();
var x = foo.SomeMethod("abc");
var y = (int)foo.SomeMethod(DateTime.UtcNow);
class Foo
{
    public object SomeMethod(object arg) => throw new NotImplementedException();
}

file static class Generated
{
    // successfully intercepted
    [InterceptsLocation("C:\\Users\\marcg\\source\\repos\\ConsoleApp7\\ConsoleApp7\\Program.cs", 4, 13)]
    internal static object SomeMethod0(this Foo foo, object arg) => arg;

    // this fails at build with CS27007
    // 'Cannot intercept method 'Foo.SomeMethod(object)' with interceptor 'Generated.SomeMethod1(Foo, DateTime)' because the signatures do not match'
    [InterceptsLocation("C:\\Users\\marcg\\source\\repos\\ConsoleApp7\\ConsoleApp7\\Program.cs", 5, 18)]
    internal static int SomeMethod1(this Foo foo, DateTime arg) => (int)arg.Ticks;
}

It would be nice if that second intercept (SomeMethod1) could work, with the analyzer having the flexibility to understand the types of the expressions being passed in, and remove the indirection. This might be something for a future C# version, though, and to be honest I can't hand-on-heart say that it is essential; the following works fine:

    [InterceptsLocation("C:\\Users\\marcg\\source\\repos\\ConsoleApp7\\ConsoleApp7\\Program.cs", 5, 18)]
    internal static object SomeMethod1(this Foo foo, object arg) => (int)((DateTime)arg).Ticks;

but now I'm at the mercy of the JIT as to whether the box/unbox will be avoided or not.


The same might also be said for unspecified optional parameters; basically, rather than "bind the method and parameters, now just swap the method that is being called", more of a "hey, you're being intercepted; try the swap first, and bind to that", i.e. the code compiles (including optional parameters and types, conversions, etc) as though the code had said, at source:

var y = (int)Generated.SomeMethod1(foo, DateTime.UtcNow);

This would allow interceptors to remove redundant unspecified optional values from the code entirely (in addition to not requiring casts/boxing) - just maybe we wouldn't want IDE0004 (redundant cast on the return type) firing in the case of interceptors.

Discussion point in the hypothetical "flexible signature" world: if the user does not include a cast, but the generator returns a different type than the original method, then what is the type of y; i.e. if we have

var y = foo.SomeMethod(DateTime.UtcNow);

and the intercepted method returns int instead of object, is y defined as int or as object ? IMO the interceptor should win, and the type should be int but interceptors should usually play nicely (unless there's a sufficiently good reason) and return identical or "reasonably similar" types. For example, a method that returns IList<T> - it isn't unreasonable for an intercept to return as List<T> or some other concrete type, which may increase foreach performance (or avoid boxing, in the case of ImmutableArray<T>), while still being fine as an IEnumerable<T> if that is what the code does with it.


Although saying that... currently the implementation allows you to erase a generic, i.e. the interceptor can be non-generic even though the calling method is .Foo<T> - and I wouldn't want to lose that, so...

En3Tho commented 1 year ago

@mgravell I wonder what are your thoughts on debugability and discoverability of the actual code that is called now that you're tried this feature yourself

mgravell commented 1 year ago

debugability is great; I was able to step into my generated code, everything worked fine - very high score and no criticisms

discoverability is a gap, unless I'm doing something wrong; "go to definition" goes to the original non-intercepted location only, and there is no visual hint that the method has been intercepted, nor any way (other than stepping-in in the debugger) to find the code that is intercepting it (yes, I know how to find the code in the solution explorer - I mean from the call-site without having to know about the generator's existence)

mgravell commented 1 year ago

more feature feedback - I've thought of a range of other perfect uses for this feature (I want to keep laying on thick with the "yes, please can we definitely have this" conversation):

I see amazing ways of this tool allowing us to enable AOT (also: optimizations) on many (not all) libraries with zero (or minimal) changes to the consumer's code - just: add a build tool (or even just update the nuget package and get the analyzer for free)

I like this a lot.

CyrusNajmabadi commented 1 year ago

"go to definition" goes to the original non-intercepted location only,

Correct, it is unclear what, if any, IDE support we'll be able to have here for a while.

thinker227 commented 1 year ago

Since this has already been prototyped I'm not contributing a lot here, but to me it feels like this proposal consists of two separate proposals/features. One proposal is interceptors, the other is related to source generators. While neither of these are language features and are instead strictly compiler features, to me it feels like these are two separate things which have to depend on each other (as mentioned), which feels sort of clunky. In order to use interceptors you effectively have to also use source generators, but interceptors aren't inherently related to SGs. The usefulness of interceptors is dependent on a currently unimplemented SG API (ForMethodInvocationWithName), and I don't feel like that's what a major feature like this should be. It doesn't feel... confident in itself, feels more like a flimsy string connecting the two.

Interceptors would imo feel much more coherent as a feature if it wasn't split two-way, either having interceptors be fully stationed in the source-generation API, or fully in the attribute API. The former leans on SGs having the ability to do code replacement which is iffy at best, and the latter would dip into macro territory (as mentioned as an alternative).

In my personal opinion, even if it would take much longer and have a much larger impact on the language and ecosystem as a whole, I think running with a macro-like system would be the better alternative. Interceptors in this context feels like a messy band-aid when proper macro-support is what we're truly wishing for.

Granted this is not a bad proposal, I still think it has a lot of potential despite being kind of unwieldy in its current state.

figloalds commented 1 year ago

I dislike this immensely, it's hacky, ugly and extremely fragile Someone comes into the code adds a newline or a bunch of spaces (for aesthetics) BAM it's broken.

Try to do this in a more [Attribute] oriented way, make library code have a "Intercept at compile time" feature, Maybe take inspiration from known "hack/intercept" frameworks like HarmonyLib

En3Tho commented 1 year ago

What makes you think a bunch of spaces or newlines will break this? It's not like it uses raw text to analyze things.

Also, adding attribute means the current code has to be changed. And if it has to be changed anyway then there is no point in having this feature, really.

AlBannaTechno commented 1 year ago

@figloalds I believe this feature is intended for utilization by source generators.

CyrusNajmabadi commented 1 year ago

Someone comes into the code adds a newline or a bunch of spaces (for aesthetics) BAM it's broken.

It would not be, as the interceptor location will be updated accordingly.

dotlogix commented 1 year ago

This feature feels so wrong, I can't really explain it. As other ppl mentioned code that does sth different than what it says definetly isn't the source of all evil.

Allowing interceptions of an arbitrary code location is a security concern and leads to strange errors. What if the interceptor code throws Exceptions for example. What does the Stacktrace tell you? Called method X at line 200 which does not even exist in your source code?

I appreciate source generation and AOT, but this should definetly be an opt-in feature imho. I would never allow such things in my code base sorry.

CyrusNajmabadi commented 1 year ago

Allowing interceptions of an arbitrary code location is a security concern

What security concerns?

What if the interceptor code throws Exceptions for example. What does the Stacktrace tell you? Called method X at line 200 which does not even exist in your source code?

The method exists in your source code. This is just a part of generators, which generates code that you can see, interact with, and use just like the rest of the code you manually wrote. The only difference is that it can generate a ton of tedious boilerplate for you quickly, instead of you having to do it manually.

but this should definetly be an opt-in feature imho.

All generators are opt-in

En3Tho commented 1 year ago

I don't get this arbitrary point to be honest. You have the attribute on the method - it gets intercepted. No attribute - no interception.

I can't see how interceptors introduce more security concerns. You have to consume the library that is using them. And that library can already contain malicious code or source generator or an analyzer or whatever.

To me interceptors are a specific way to run a source generator. Nothing more and nothing less.

CyrusNajmabadi commented 1 year ago

In a similar vein, any extension method you call that, could end up calling a 'closer' extension method that a generator produces in your sane namespace. No need for interceptors to already get that behavior.