Closed heejaechang closed 2 years ago
related to this https://github.com/dotnet/roslyn/issues/25667
tagging @dotnet/roslyn-ide
my initial rough thinking was something like
SyntaxFactory/Generator.LocalDeclaration(tracker.GetToken(originalToken), .... track.GetToken(originalToken, options.TrailingTriviaOnly) ..
GetToken return same token as originalToken but remember what trivia (mostly non whitespace trivia) was there and whether it is used or not)
so if there is another one that does SyntaxFactory/Generator.Assignment(tracker.GetToken(sameOriginalToken), ...)
tracker will create new token with no trivia because all its trivia is already used before. sure, it can have options to control the behavior though like AllowDuplication and etc.
there can be tokens removed so it should have tracker.TrackTrivia(token) that will track trivia belong to tokens that got removed
and there will be tokens that transfer trivia, so it should have SyntaxFactory/Generator.LocalDeclar...(tracker.TransferTrivia(fromToken, toToken, options), ... )
also has some convenient methods such as node = tracker.TransferTriviaAround(fromNode, toNode, options)
which basically handle common case where leading/trailing trivia around the node are moved to new node) and this track which trivia is actually used and etc so no duplicated trivia.
also, support something like this to make sure there is no deleted one. node = tracker.AttatchLeftover(node, options)
and at the end, has something like tracker.Assert();
which will assert if there is trivia that are left over.
thoughts?
Sorry, some bits are a little unclear to me:
SyntaxFactory/Generator.LocalDeclaration(tracker.GetToken(originalToken), .... track.GetToken(originalToken, options.TrailingTriviaOnly) .. GetToken return same token as originalToken but remember what trivia (mostly non whitespace trivia) was there and whether it is used or not)
How is GetToken different from just 'originalToken'. Since originalToken already has its trivia...
SyntaxFactory/Generator.Assignment(tracker.GetToken(sameOriginalToken), ...)
I'm not sure i understand this an assignment has an expression on the left. But you're passing a token... can you clarify what this means and what the semantics are here?
and there will be tokens that transfer trivia, so it should have SyntaxFactory/Generator.LocalDeclar...(tracker.TransferTrivia(fromToken, toToken, options), ... )
How does this work? How do you get the "toToken"? Isn't hte "toToken" what you get from the actual call to Generator.LocalDeclaration(...)? So how do you have it ahead of time to pass into .LocalDeclaration?
where token comes does not matter, point is you use tracker for any token that you want to track trivia usage.
Sorry, can you use with a crisper example. Where you actually specify which tokens correspond to which? As an example, you could use "Convert for to foreach".
additionally, could you use an exapmle where it's not just a node->not transform. i.e. a refactoring where we change a node and insert a statement (and thus actually move some trivia around across nodes).
This is also a bit strange to me as .Generator is very non-token based. The purpose is to abstract over VB/C#. Whereas passing trivia along is very token based. So it's unclear to me how you would use Generator here for this purpose when the actual generator methods almost never work with tokens.
Can you give a strong example, and demonstrate how it would work with this proposed API? That would go a long way toward understand how this woudl work and if it would be sufficient. Thanks!
point is there is something that knows which what trivia existed before and what trivia is used after. and what trivia is existed before but not used after.
I dont want to get into details. how we do that is not decided.
we don't need to use generator. or we can make generator to accept token. both are fine.
or we can make generator to accept token. both are fine.
How would the generator accept tokens? The tokens are different between C# and VB... but Generator is the common layer between them. Can you show what this would look like and how it would work?
why generator can't accept token? SyntaxToken is on common layer.
already generate identifier.
also, it doesn't need to be generator. it can be syntax factory.
not sure why you bring in generator/syntax factory which is just side thing when main point is creating something that track trivia usage.
I dont want to get into details. how we do that is not decided.
I'm asking for an example with a real-world use case we have. It would help bring clarity to your idea, and would help us discover if the idea is sufficient or if there will be problems.
It's also necessayr, because the original post used very vague terms. like "originalToken" "sameOriginalToken" "toToken" "fromtToken", etc. I do not know what these correspond to in an actual scenario. If you provide an actual example then we can say:
"here's a case where we want this trivia to move over, but there isn't a clear place for it to go. Here's how that would work with this API. You would call this helper, which would help blah blah. At the end, here's where the trivia would go what we knew how to handle, and here's where the trivia would go that we didn't handle" etc. etc.
Basically, it's really hard to determine if this idea is sufficient without actually having an example to walk through.
"here's a case where we want this trivia to move over, but there isn't a clear place for it to go. Here's how that would work with this API. You would call this helper, which would help blah blah. At the end, here's where the trivia would go what we knew how to handle, and here's where the trivia would go that we didn't handle
this kind of thing is something we can add as you go along. that doesnt make it good or bad on the idea whether we can create something that track comments or not.
why generator can't accept token? SyntaxToken is on common layer.
Because C# and VB have different syntax. Take something as simple as an "if statement" C# requires parens. Vb has no parens, but has an "end if". etc. etc.
what tokens are you providing? Without an example, i cannot understand the proposal.
you can join once we have enough concrete example on this. if it is too vogue, it is fine I work with people who can understand for what it is now.
I dont have concrete proposal like I said above. I have rough idea.
only point I have now is I believe we need something that track trivia usage so that people don't need to care about it much.
this kind of thing is something we can add as you go along.
How? I literally do not understand how the proposal works, and i'm trying to figure it out.
that doesnt make it good or bad on the idea whether we can something that track comments or not.
I never said if it was good or bad. I am simply trying to work through an example so i can get an understanding of how this works, and what your original examples meant. Right now i do not understand.
not sure why you bring in generator/syntax factory which is just side thing when main point is creating something that track trivia usage.
Heejae, your original thoughts included exactly this:
my initial rough thinking was something like SyntaxFactory/Generator.LocalDeclaration(tracker.GetToken(originalToken), .... track.GetToken(originalToken, options.TrailingTriviaOnly) ..
So i pointed out my concerns with Generator because you specifically mentioned Generator explicitly. You cannot ask someone "thoughts?" about their post, and then say that i should not bring up the things you literally said.
also, if you feels my idea is too vogue and you agree that it would be nice to have such tracker, you can propose your idea if you have one.
like I said, I dont want to go too much detail and make people think what I wrote about is final design or something.
I dont have concrete proposal like I said above. I have rough idea.
I get that. I'm trying to flesh out thsi rough idea. An example would help us do that. We can start with the code we think a feature would be trying to convert over. And we can consider how some sort of "tracker" coudl be used to accomplish this. In the absence of any sort of example, this approach is far too vague to have any idea how it would work, or if it would be suitable.
like I said, I dont want to go too much detail
I'm not asking for that. I'm asking for an example of a scenario where we want to preserve comments so we can walk through teh rough idea about how this tracker idea would work. That will help understand the overall idea you're asking asking for.
you can propose your idea if you have one.
I do not have a proposal. I am keenly interested in your proposal. I am simply asking for an example so we can better understand what you mean with things like "toToken". Having an example would allow you to say, "here, i meant toToken to mean ths specific token. the tracker would let us blah blah blah". I don't need low level details. Jsut an general idea of what the tracker is actually tracking and how someone generally uses this PAI.
i pointed out my concerns with Generator
this is exactly what I dont want to happen. ignore Generator. it was used a way kind of way saying code generation. whether it should work with Generator or not, or it should work with SyntaxFactory or not. or we need to make something else is all up to air. that was not the point I am trying to get people's thought.
Here. allow me to start with a simple example. Say i have the following feature: 'Convert foreach to for'. the user has written the following:
// walk the list of customers
foreach (var otherCust /* not the real customer */ in custs.Where(...).ToList() /* do not lazily compute this */)
{ // essential that we do this first.
SomeStatement();
} // at this point our contract is validated.
How do we envision the feature using this 'tracker'. What are the general sorts of things the feature would do. Which are the tokens it would refer to. What happens at the end?
etc. etc.
this is exactly what I dont want to happen. ignore Generator. it was used a way kind of way saying code generation. whether it should work with Generator or not, or it should work with SyntaxFactory or not. or we need to make something else is all up to air. that was not the point I am trying to get people's thought.
I am not psychic :D
You asked for thoughts on the post. In the post you mentioned Generator. So i gave my thoughts there. Now that i know you are not thinking about the GEnerator case, that's fine. I had no idea that, for you, "[generator] is just side thing". Now htat i know, we can definitely table. But please recognize that unless you say that something is not important, if you bring it up, and if you use it in your examples, my defualt position is going to be that you thought it was important.
but, I already told you, whether it being used for Generator, SyntaxFactory is not decided multiple times? not sure why you keep bringing generator up, how it is going to support token again and again? like I said, it doesn't need to use Generator, or making Generator to support syntax is all fine to me. that discussion can happen way later.
Or, if you'd like, we can do 'for' to 'foreach' as that has many tokens that don't transition over. i.e.:
// walk the list of customers
for (int i = 0 /* restart ignore previous progress */;
i < customers.Count /* include canceled customers */;
i++)
{ // essential that we do this first.
SomeStatement();
} // at this point our contract is validated.
When converting this to a foreach, how would the feature generally interact with your tracking system? Which tokens would the feature be concerned with, and how/what would this tracker generally do?
I aks this because if we are to make a system for helping out here, we should have good confidence how it would generally be used, and how it would ensure the goal of "we can provide some helper so that they have similar behavior."
but, I already told you, whether it being used for Generator, SyntaxFactory is not decided multiple times? not sure why it bring generator
...
Heejae, you brought up generator. It was right here: https://github.com/dotnet/roslyn/issues/25844#issuecomment-377602837. Then you asked for thoughts. That was right here: https://github.com/dotnet/roslyn/issues/25844#issuecomment-377602837. I gave you my thoughts. I had no reason to believe that 'generator' was something i should ignore given that you listed it in the very post you asked for thoughts on.
You've now asked for generator to not be part of hte discussion. I have accepted that and i have said that we can table that part of the discussion. At this point, i have excluded it entirely from my thoughts. There is no need to discuss it anymore. If, in the future, it becomes relevant again, we can consider it.
Please assume that if you bring something up and ask for thoughts, then i will do precisely that. :)
how/what would this tracker generally do
you tell me how it wants to be behave? if you have specific need, you just tell people rather than asking other to tell you when you already have something in mind.
when you first brought up generator, I told you that is not the one I want to discuss. multiple times. you keep bring it up. I dont get it. this issue is already gone too big not related to anything to my original post.
..
anyway, if you don't think such system needed. just let people who think such system needed to discuss. if you agree such system would be nice, give us input. I am not asking people to give me opinion on my design. I am trying to gather people thought on such system so that we can create one whatever that is.
you just tell people rather than asking other to tell you when you already have something in mind.
I literally said: i have no proposal. I am genuinely trying to understand the idea here. I think it may have promise, and i would like to help with the fleshing it out as i think providing a subsystem to help features out here would be great.
As this is a subsystem that is being proposed to help features work with trivia, i thought it would be valuable for us to have some examples to work with to help guide the discussion. I have now provided a couple of examples, and i would be thrilled to hear how you think someone migh tbe able to use some sort of 'tracker' to potentially make these sorts of refactorings easier to write and more consistent (wrt. trivia).
when you first brought up generator, I told you that is not the one I want to discuss. multiple times.
No. You did not. You can see that in the thread. The sequence was:
At no point did you say you did not want to discuss it. At several points you stated things about generators. So i responded to your statements.
okay for your example
// walk the list of customers
foreach (var otherCust /* not the real customer */ in custs.Where(...).ToList() /* do not lazily compute this */)
{ // essential that we do this first.
SomeStatement();
} // at this point our contract is validated.
I want some system, that tracks
all these trivia
// walk the list of customers
/* not the real customer */
/* do not lazily compute this */
// essential that we do this first.
// at this point our contract is validated.
I think it doesnt need to be all those trivia. the dev should be able to decide which trivia they want to track (it can be implicitly do or explicitly asked)
it will be nice if the system help to track no duplicated trivia is used (would be nice if it is automatically handled).
also at the end, it would be nice if the system can tell one whether there is trivia leftover and simple helper to attach those leftover to some place.
oh. god. sure I wrote here and here and here.
https://github.com/dotnet/roslyn/issues/25844#issuecomment-377618919 https://github.com/dotnet/roslyn/issues/25844#issuecomment-377619108 https://github.com/dotnet/roslyn/issues/25844#issuecomment-377619792
however way it goes I am fine. use generator, not use generator, that's not the point. I guess I didn't say magic word that understood as it is not about generator.
anyway, if you don't think such system needed.
I literally said nothing of the sort. Indeed, i think it would be very helpful to have such a system. I said that in existing posts.
Please re-read my comments without feeling that i'm against your proposal. All of my responses have been in the spirit of actually trying to understand this space better and to have a good discussion that can make progress by use of actually working through examples to help guide us.
just let people who think such system needed to discuss. if you agree such system would be nice, give us input.
I have given input. My input is: we should walk through some examples and figure out what we'd like, and how we might provide an API that would all the feature writer to accomplish this task. You were not forthcoming with examples, so i have provided those examples. I would like to work through them to see how some sort of 'tracking' system could be used with them.
I am not asking people to give me opinion on my design. I am trying to gather people thought on such system so that we can create one whatever that is.
Yes. That is what i have been trying to do from teh beginning. As i have stated several times, i think we would have a better change creating one if we have actual examples, so that then we can talk aobut how the feature writer would want to work with it and interact with any such API related to trivia.
oh. god. sure I wrote here and here and here.
Please reread your posts and put yourself in the perspective of someone reading them who doesn't know that inside your head you've come to that conclusion :) Again, i'm not psychic.
it will be nice if the system help to track no duplicated trivia is used (would be nice if it is automatically handled).
also at the end, it would be nice if the system can tell one whether there is trivia leftover and simple helper to attach those leftover to some place.
Ok. if that is your goal, then how about a simpler proposal: When using the SyntaxEditor, we add add an option to it (or an option in GetChangedRoot()) that says "all previous trivia should be present in the final tree. Because SyntaxEditor operates by way of syntactic transforms, it should be the case that we can map all trivia in the original tree to the same trivia (or absense thereof) in the final tree. For any trivia that is absent, we have very simple rules/heuristics about what to do.
Namely: walk the parent chain of that trivia. If that parent is in the new tree (including by tracking through replacements), attach to the parent in the new tree.
This guarantees that all trivia will appear in the final tree. It also ensures no duplication. It also has at least a somewhat reasonable heuristic for where the trivia would go if not explicitly managed by the feature.
--
Note 1: because someone might not be using SyntaxEditor, i also propose a simple helper that works between two different roots. Namely, something like "SyntaxNode TransferMissingTrivia(SyntaxNode fromRoot, SyntaxNode toRoot)```. this would do what i outlined above, but would work for people who were using SyntaxFactory and not going through SyntaxEditor.
Note 2: for any trivia we move around, it's likely the case that we need to format. Otherwise, things are likely going to look pretty bad.
Note 3: if we do this, i would recommend as part of the work that we go to a bunch of features that work with trivia and we take out a lot of the explicit handling (except for the cases where features want to actually intelligently move the trivia to another locatoin). In the cases where we're just messing with trivia just to make sure it's formatted properly and whatnot, we should probably get rid of the code and see how good this system would work. Based on that, we could then decide how to tweak and if we need further helpers or options to handle normal cases that would arise in practice.
okay. so. forget any code I wrote. I wrote those as some kind of scratch not a design. I dont want to get into detail on those. point is what I described around the code.
I want a system that people can use. so that they don't need to explicitly do handle trivia (like WithTrailingTriva. WithoutTrivia, or WithLeadingTrivia. but again, I am fine if people has too at the end)
and the system tracks, what trivia I am interested in, what trivia is already used. what trivia is not used.
and the system make sure, if same token used several times, and only one of them gets the trivia, but the other doesn't (but it can have options to deal with bheavior and etc)
also, at the end, it gives a way to attach all left out comment to somewhere if there are any.
probably a lot of helper method will be useful, also away to track token that are forked from original one as well.
When using the SyntaxEditor, we add add an option to it (or an option in GetChangedRoot()) that says "all previous trivia should be present in the final tree. Because SyntaxEditor operates by way of syntactic transforms, it should be the case that we can map all trivia in the original tree to the same trivia (or absense thereof) in the final tree. For any trivia that is absent, we have very simple rules/heuristics about what to do.
that's good. I think regardless of tracker, that sounds good one to have.
It would have been nice if we had access to green node for this (https://github.com/dotnet/roslyn/issues/25844#issuecomment-377628196). tracking which trivia is used is hard without green node.
but probably can use just ToString or ValueText for comparing.
Note: if necessary, we could likely just get an appropriate API exposed to allow us to ask this.
alright, but I still believe tracker is nice to have so that people doesn't need to handle trivia explicitly. having option in SyntaxEditor is nice as big hammer so comments is never deleted. (or explicitly stated when deleted). but I believe it is nice to have tool for users to deal with trivia in more detail but not explicitly moving it around like WithXXX..
also if tracker is part of syntax editor, it can use ones left in the tracker also a way to see what is left over.
alright, but I still believe tracker is nice to have so that people doesn't need to handle trivia explicitly.
I need more detail on how you think this would actually work. :) It's really hard for me to tell if "tracker is nice" when i'm not sure what it does, or how someone uses it. can you provide some examples of how you would use this ideally using either of the examples that were provided above.
Without clarity on how we think this would actually work or be used, i cannot actually making any determination of if this would be good to have or not. :)
so my rough idea at the beginning was that. the tracker has something like this (doesn't need to be that name or that signature or anything, point is that it tracks trivia associated with the token)
tracker.GetToken(token)
internally, it record the token has leading trivia A and trailing trivia B (I know it is list), and return token as it is if it is first time token is given to the tracker.
when it is called second time for whatever reason tracker.GetToken(token).
it knows that trivia A and B is already used somewhere, so, it returns the token' without trivia A and B.
that way, dev doesn't need to track token's usage and doesn't need to do token.WithoutTrivia explicitly or do var newToken = token.WithoutTrivia and remember to use newToken after that point.
Ok. So it sounds like you're trying to address the scenario of:
The user has a starting token that will end up in many places in the final tree. For example, an identifier token htat was in the source tree, but which ends up in many locations in the fnial tree (i.e. because we're placing code with references to that identifier). In that case, we really only want one of these tokens to have the original trivia, and we want the rest to be stripped. Is that it?
If so, this is definitely an interesting and important scenario. However, IMO, it's also one of the scenarios that is less difficult to get right. Stripping trivia from a token is pretty simple and basic, and it feels like we have decent enough tools already.
What tends to be hard is when we are mutating code, and now formatting can get screwy. For example, how do i move the comments from one node to another. It's non-trivial due to how whitespace and comments interact.
Similarly, when mutating code, it can be challenging to think about all possible trivia and where it needs to go. Specifically, we generally don't even do anything when there are comments that are on nodes/tokesn that are lost from the final tree.
In general, i'd like a service that handles more of these complex corner cases for people. I'm not sure i really see how "tracker.GetToken(token)" is much better than token.WithoutTrivia(). But maybe it would be? I'd need to see some examples in practice to get a feel. Gut feeling is that it's nto there yet. But would def be interested to see more.
its implementation (just idea not needed to be this way). but in GetToken, it can always fork token with annotation if given token doesn't already has annotation and then return that token.
that way, it can track back forked token to original token (with map like [annotation, token]), so it can compare trivia.
or we can add something in compiler layer to compare trivia through green node.
my end goal/hope after this tracker is this.
I do roughly this in all my code fix/refactoring
var node = SyntaxFactory/Generator/SometingNew.XXXX( tracker.TransferTrivia(originalNode.token1, SyntaxFactory/Generator/SometingNew.TokenXXX(...)), tracker.GetToken(originalNode.token1), tracker.GetToken(originalNode.token2),....);
var final = tracker.AttachLeftOver(node); tracker.Assert()
root.ReplaceNode(originalNode, final)
something like that.
the point is when I write this, I don't care about trivia explicitly.
this is not exact thing. AttachLeftOver can move to SyntaxEditor like your idea above. so instead of root.Replace, it can be syntaxEditor.ReplaceNode(original, node, AttachLeftOver) or something like that.
also the tracker could handle more cases like you commented above. like tracker.GetToken(token, WithoutLineBreak or WithoutElasticTriva) or something like that. those are all details.
but the point is people doesn't need to explicitly deal with Trivia with WithTrailingTrivia(node.GetTrailingTrivia()) and etc.
and fairly confident trivia would work most of time and consistently in all fix and refactoring.
...
and in my opinion, doing all these WithTrailing/WithLeading is actually quite a bit of time consuming. whenever I use token from a tree, I had to think about trivia. it would be nice if I don;t need to.
and also it would be completely fine there is some case where trivia break final code but we still let code transformation happens. it is a bug but not something block code transformation.
and those case (a lot of time in VB comment after _ ` line continuation and comment or expression at the end of statement that contaiin end of line moving into middle of statement making statement to be broken to 2 lines and break code or bring comments with it and break code) could be handled centrally in the tracker.
and in my opinion, doing all these WithTrailing/WithLeading is actually quite a bit of time consuming. whenever I use token from a tree, I had to think about trivia. it would be nice if I don;t need to.
I woudl totally love this too :)
My primary hope though is that we could get to a place where you don't think about. Rather than just shuffling things around. For example, if instead of thinking about WithTrailing/WithLeading, i now need to htink about GetToken(token, WithoutLineBreak)
then it's not clear if that makes things better... or if it's still hard to think about.
However, i do not know enough to state if this would be good or not. I think there is promise here. What would be valuable would be to actually try this out and then convert several very different features over to using this. We could then compare how the existing code looks to this new code.
--
Note: this is what we did for SyntaxEditor. Prior to having this, we had to go do all these tree walks, along with painful annotation tracking. With SyntaxEditor we could immediately see how 100 complex lines became 5 simple lines. I'm not expecting the same level of improvement here, but i would at least want to see the new code and go: yup, htat seems nice and clear and easy to do.
In other words, seems like something worth trying. but the proof will be in the pudding :)
@dotnet/roslyn-ide ping?
hmmm another idea is putting this tracker in generator and track all token given to generator. regardless whether it is part of node, or token explicitly given.
that way, when generator is used, it never duplicates comments when same token is accidentally used. or comments are left from original node.
hmmm another idea is putting this tracker in generator and track all token given to generator. regardless whether it is part of node, or token explicitly given.
Note: the generator is very intentionally designed to be entirely stateless. So i would be wary about that. However, being in SyntaxEditor might make sense.
currently, each codefix/refactoring does its own logic to handle trivia when tree transform for code generation.
this makes each code fix/refactoring to have different degree of trivia handling and also different behavior.
this can't be 100% automated or make to have 100% same behavior. but we can provide some helper so that they have similar behavior.
we should have something that help in this area.