dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.02k stars 4.03k forks source link

Add TryGetSemanticModel to Compilation #26347

Closed GeorgeAlexandria closed 6 years ago

GeorgeAlexandria commented 6 years ago

Hi,

What about adding (maybe as pull request from community) a bool TryGetSemanticModel(SyntaxTree syntaxTree, bool ignoreAccessibility = false, out SemanticModel semanticModel) at Compilation API, that doesn't throw exception if input syntax tree doesn't exist in the current compilation but returns false?

At this time all of invocation Compilation.GetSemanticModel(...) we need to wrap at try|catch to handle exception, when syntax tree doesn't exist in the compilation.

Checking a result of some TryGet... method will be cheaper (we doesn't use a expensive throw operation) and more clear (a pattern try{...}catch(...){...} will be replaced by if(...){...} in a code).

I will glad to see your opinion for that.

sharwell commented 6 years ago

I'd be concerned about confusion with Document.TryGetSemanticModel, which has very different semantics and is generally discouraged:

https://github.com/dotnet/roslyn/blob/82dac747490ba681577b9d1a47524b9f4e0af137/src/Workspaces/Core/Portable/Workspace/Solution/Document.cs#L236-L245

CyrusNajmabadi commented 6 years ago

API, that doesn't throw exception if input syntax tree doesn't exist in the current compilation but returns false?

how are you getting into a situation where your input syntax tree is not part of your compilation?

GeorgeAlexandria commented 6 years ago

@CyrusNajmabadi, as one of example you have a lot of compilation (a cache) that were generated from a sources and you want to retrieve Symbol from a some processing syntax node (in this case syntax tree can or not be contained in a cache of compilations)

As a second example it can be done when you try to create a compilation from a source code, and you want that the compilation can be emitted (doesn't have syntax nodes and trees that produce compilation errors, if it has, then you remove it and it dependens to avoid errors). And you try to get ISymbol for syntax tree of source file, that maybe was removed from compilation to avoid a compilation errors. (I can continue the list of examples)

I know about Compilation.ContainsSyntaxTree(SyntaxTree syntaxTree) and I can use it before GetSemanticModel but I want to avoid a double checking of existing the syntax tree in the Compilation.ContainsSyntaxTree and Compilation.GetSemanticModel

CyrusNajmabadi commented 6 years ago

nd you want to retrieve Symbol from a some processing syntax node (in this case syntax tree can or not be contained in a cache of compilations)

If you want that, i would recommend passing along the node and the compilation it came from. That way you never have hte issue where you're talkign aobut node from a different tree than that compilation.

(I can continue the list of examples)

That would be good. Right now it seems strange. Effectively, you are allowing compilations and trees to flow through yoru systems in a disconnected manner, but then you want to go get semantic information about one of your trees, even though you're going to check against a potentially unrelated compilation. It seems like that indicates a logic problem in your domain. If you want to be able to ask semantic questions, you should carry along the compilatoin with the tree.

Or, better yet, just pass along hte SemanticModel which holds onto both of them. Note: getting a semantic model is not an expensive operation (it's the queries on the semantic model that can be expensive). Essentially a SemanticModel is just a wrapper that holds onto the pair of Compilation+SyntaxTree, and provides efficiency caches for when you actually go ask semantic questions.

GeorgeAlexandria commented 6 years ago

@CyrusNajmabadi

i would recommend passing along the node and the compilation

just pass along hte SemanticModel

I can't do that because we create compilations from sources in the one process and a request to analyze a file came from the other process. So we anyway need to parse requested file from the other process and try to get semantic info for them.

(I can continue the list of examples)

That would be good

Another one, your .csproj contains a some file on the vb (.vb, .vbhtml, or .aspx with language vb) (doesn't matter that it's csproj and vb or vbproj and cs). You can't create a one compilation for compile items of .csproj and vb files (Roslyn compilation by design can't contain CSharp and VisualBasic syntax trees together). So you create a few compilation. And when you get the request to analyze a "vb" file you also need to retrieve a semantic info for them.

CyrusNajmabadi commented 6 years ago

In all these cases, i'ts unclear to me why you are not tracking which compilation a syntax tree belongs to. Taking a random tree and trying to use it against a random compilation is simply not a viable pattern for using roslyn. Note: if this is intrinsic to what you are doing, i would just write the simple extension yourself. There's not much better we could do on our end, and i don't think it would be a good idea to encourage this pattern through our own api.

sharwell commented 6 years ago

Taking a random tree and trying to use it against a random compilation is simply not a viable pattern for using roslyn.

I agree with this. Based on the information provided so far, my recommended course of action would be closing as "Won't Fix". If there's something still blocking, we could still discuss ways to address that problem.

GeorgeAlexandria commented 6 years ago

i'ts unclear to me why you are not tracking which compilation a syntax tree belongs to

We have a lot of caches and mapes between trees, semantics, files, projects, and all this constructions must be coherent, so if we (user) change, for example, a source, we must apply all changes to existing constructions (generate trees, compilation, semantics then clear the all old date and etc). And we want to avoid to adding the some additional structures that must be coherent if is possible.

my recommended course of action would be closing as "Won't Fix"

I have no any other idea or reason that can convince you and if you still think that the TryGet... is redundant, I will not try to convince you more.

sharwell commented 6 years ago

@GeorgeAlexandria Keep in mind that I'm specifically referring to the API proposal here. As for the larger discussion of how to address the usability problems you are encountering, I see no reason to close that discussion. Is there any reason the extension method approach mentioned by @CyrusNajmabadi would not work for you while we try to figure out the underlying issues with keeping trees and compilations linked?

public static bool TryGetSemanticModel(this Compilation compilation, SyntaxTree syntaxTree, bool ignoreAccessibility, out SemanticModel semanticModel)
{
  if (!compilation.ContainsSyntaxTree(syntaxTree))
  {
    semanticModel = null;
    return false;
  }

  semanticModel = compilation.GetSemanticModel(syntaxTree, ignoreAccessibility);
  return true;
}
GeorgeAlexandria commented 6 years ago

@sharwell as I pointed out

I know about Compilation.ContainsSyntaxTree(SyntaxTree syntaxTree) and I can use it before GetSemanticModel but I want to avoid a double checking of existing the syntax tree

we use the same extension in the project that you are pointed (and I try to replace the old pattern try{...}catch() to the extension when I find it). But I additionally want to avoid a duplicated operations and calls: if (!compilation.ContainsSyntaxTree(syntaxTree)) and compilation.GetSemanticModel(syntaxTree, ignoreAccessibility) they do the same cast and checking inside:

I don't know, maybe I really a meticulous developer that want to decrease a count of duplicate operations?

CyrusNajmabadi commented 6 years ago

But I additionally want to avoid a duplicated operations and calls:

but i want to improve it)

The duplicated op doesn't seem like a problem to me. :) Calling ContainsSyntaxTree is O(1). Doing that seems fine, and ultimately appropriate since you've picked a model where you have allowed your compilations and trees to drift out of 'coherence' with each other.

Having a simple and very cheap check of "am i coherent" seems like an acceptable price to pay given that that's how your model is intended to behave (and given that that model is very out of norms with how Roslyn is intended ot be used, and has been primarily designed for).

GeorgeAlexandria commented 6 years ago

The duplicated op doesn't seem like a problem to me. :)

:) , but

Calling ContainsSyntaxTree is O(1)

it's a little mistake, because ContainsSyntaxTree calls inside the ImutableDictionary.ContainsKey and this calling is O(log_2(n)). ImutableDictionary is build over AVL and searching can be done at O(log_2(n)), n – count of nodes in a tree (we found a bucket by hash and in the bucket find item using a linear search). So instead of 2*O(c), c – constant as you think, we really have 2*O(log_2(n)).

And, of course, I want to decrease a time from 2*O(log_2(n)) to O(c) but to achive this i think that we need to rewrite a lot of Roslyn code, so it really isn't worth that. However, we can decrease the time from 2*O(log_2(n)) to O(log_2(n)) just adding a new little method to Compilation API :)

CyrusNajmabadi commented 6 years ago

However, we can decrease the time from 2*O(log_2(n)) to O(log_2(n)) just adding a new little method to Compilation API :)

that does not seem worthwhile at all. I would be shocked if this overhead was ever even measurable in any real world scenarios. :)

CyrusNajmabadi commented 6 years ago

And, of course, I want to decrease a time from 2*O(log_2(n)) to O(c)

Can you explain this a bit? Why would this be valuable? doing this check is going to be practically immeasurable in any real world situation. Justs the cost of a single binding operation in a semantic model would have vastly more overhead.

and since roslyn itself can't even give you O(1), if you really want O(1), then i would recommend just building the cache on yoru side using a normal dictionary. You can then just do if (!_ourDict.Contains(tree)) { return; } with the perf you want.

This goes back to the above point: Roslyn expects that people use the right trees against the right compilations. That's a core part of the design of this system, and explicitly why things throw if you do things wrong (since it almost always reveals a logic bug where the two were allowed to diverge unintentionally). If your domain is a adamant about supporting arbitary trees against arbitrary compilations, then it seems like the best place to do this check is within your own domain as that's the only place where supporting this scenarios seems appropriate. :)

GeorgeAlexandria commented 6 years ago

Why would this be valuable? doing this check is going to be practically immeasurable in any real world situation.

Our product has a deep and hard analyzing that can take a some time and a memory, in addition our customers often use a not powerful machines, so analyzing times increases. And we try to reduce analyzing time anywhere that it can be done.

Justs the cost of a single binding operation in a semantic model would have vastly more overhead.

Of course, I agree with that, so i pointed out that it really isn't worth that.

However, we can decrease the time from 2*O(log_2(n)) to O(log_2(n)) just adding a new little method to Compilation API :)

that does not seem worthwhile at all. I would be shocked if this overhead was ever even measurable in any real world scenarios. :)

Our customers have a big solutions and a lot of count of sources, so I don't know how many improvements we will get from it, because TryGet... doesn't exists, and we nothing to benchmark (we didn't yet create a our Roslyn build which we will add a TryGet...) :) (But this method really easy to add)

Anyway, I see that I didn't convince you, because TryGetSemanticModel go against the Roslyn original design and our discussion has come to a standstill. @CyrusNajmabadi, @sharwell, thanks for your attention and for your opinion about this discussion! 😃

gafter commented 6 years ago

I don't think we want this in the Roslyn public APIs. However, you can add this two-liner to your project to get the effect you desire:

        public static bool TryGetSemanticModel(this Compilation compilation, SyntaxTree syntaxTree, bool ignoreAccessibility = false, out SemanticModel semanticModel)
        {
            var result = compilation.ContainsSyntaxTree(syntaxTree);
            semanticModel = result ? compilation.GetSemanticModel(syntaxTree) : null;
        }
CyrusNajmabadi commented 6 years ago

Our product has a deep and hard analyzing that can take a some time and a memory, in addition our customers often use a not powerful machines, so analyzing times increases. And we try to reduce analyzing time anywhere that it can be done.

It woudl be worthwhile if you could profile your scenario and provide numbers showing how much time is spent on this. If this did show up on the traces i would be surprised, but it would make this based more off of data rather than speculation.

Thanks!