Closed nzakas closed 7 years ago
Question, does Espree currently makes a second pass to attach comments? Moving this into ESLint would require second pass. If Espree does it in a single pass, moving it into ESLint core will slow things down quite a bit.
Espree does the attaching as it parses, so there's no second pass. This calculation does add extra time to the parser -- I'll get some hard numbers when I'm able.
What I'm suggesting did not require a full traversal so it wouldn't add a second pass.
I don't believe what I'm suggesting will slow things down a lot because we will likely not be calculating comments for every node, only for the ones specifically requested by rules. In theory, this could be more efficient than what we are doing now due to speeding up Espree parsing.
I don't think we should evaluate this idea based on performance, though. It should be evaluated on the basis of being the right choice for the ecosystem. If it's the right choice, we can deal with implementation issues as they occur.
We shouldn't evaluate idea purely on performance, but it's one of the variables that goes into the list of pros and cons. If performance isn't going to suffer greatly, I don't think it should be blocking. Out of curiosity, which parsers out there do not support attaching comments? Esprima can do it, right? So can Acorn and Babylon. Are we talking about future hypothetical parsers?
I think, simply we can publish the package to attach comments as postprocessing to npm? The developers of parser plugins can use it if necessary.
@ilyavolodin my point is that we should never say, "I think this will be slow, so let's not do it." We should say, "this is the right thing to do, so let's see if we can make it work." Your opening position was that this would be slow without mentioning if you thought the idea had merit or not. We shouldn't operate like that.
I think you'rea bit confused about comment attachment. I'm talking about adding leading/trailing comment arrays to nodes. Esprima does that, but in a way that's different from Espree such that not all of our tests pass. Acorn does not do that at all. Babylon does comment attachment in a different way. And as I mentioned in the original description, we are running into trouble trying to replicate Espree comment attachment in typescript-eslint-parser
.
@mysticatea I don't think we can do that. Espree's logic happens during parsing, and what I'm talking about is only calculating comments dynamically when requested.
@nzakas Sorry, I think you misunderstood me. I haven't worked with comment attachments before, those messages above were honest questions, not sarcastic way to say we don't need this. I was trying to gather information about how this stuff currently work, to understand how proposed modification might affect it.
@ilyavolodin Acorn provides a hook (check out the onComment
option in https://github.com/ternjs/acorn#main-parser) that Espree uses to collect and attach comments as it parses, which allows us to process the comments in a single pass.
I love the idea of moving this into ESLint core! The fix I worked on for a comment attachment bug in Espree ended up having to be manually copied over to Babylon, and took a while to get merged (it wasn't ported over by me, but I was watching the progress on it). If we could provide this it would definitely help out our users.
As an aside, this is something I'd be happy to look at and work on after we make some more progress on the JSCS compatibility issues.
@ilyavolodin I didn't think you were being sarcastic, I was just trying to get focus in on whether or not this seems like a good idea absent implementation concerns.
@kaicataldo cool! Let's see if others like this idea, too. If so, I'll share some of the implementation thoughts I have.
Well... you know I love this idea 😄
If there are no objections, I'd like to mark this as accepted because it's a big blocker for the TypeScript work.
TSC Summary: Currently, Espree attaches comment information to nodes (via leadingComments
and trailingComments
) and then ESLint uses that information. This expectation means that every parser that wants to work with ESLint must implement the same algorithm for comment attachment. This proposal is to move the comment attachment calculation into ESLint core, so that parsers just have to pass in the top-level comments
array and ESLint will figure out where it goes. At the moment, this is a blocker for the TypeScript parser.
This is also potentially a breaking change since it would remove the leadingComments
and trailingComments
properties from nodes. While we explicitly say to use sourceCode.getComments()
instead, I have come across instances in the wild of people relying on leadingComments
and trailingComments
. So we may need to flag this as a breaking change.
TSC Question: Is it okay to move forward with implementation?
Accepted per 8/18 TSC meeting.
Excellent news! I feel somewhat responsible for this, so if I can be of any help let me know 😄
@JamesHenry Are you volunteering to implement the whole thing? I think you are-- thank you very much! :stuck_out_tongue_winking_eye:
@kaicataldo @JamesHenry do either of you want to tackle this?
I will take a stab at it and perhaps submit my work to @kaicataldo for discussion?
I believe it is quite a difficult task. Without an existing spec for how comments should be attributed, I believe we just have to find a way reverse-engineer existing test assertions into some well-defined logic. I attempted something similar to this already in the typescript parser but was forced to give up at the time.
Hopefully, now that the task is a bit better defined we should be able to get it over the line.
@nzakas @JamesHenry Sounds like an interesting challenge! Happy to help in any way I can (whether it's working on it myself or collaborating/discussing with @JamesHenry). I've worked on comment attachment in Espree - so have some working knowledge in that area - but I'm not an expert by any means :)
The expected behavior is in the Espree tests (https://github.com/eslint/espree/tree/master/tests/fixtures/attach-comments). Here's what I'd suggest as a path forward:
SourceCode#getComments()
method such that they pass right now.attachComment
option when ESLint calls parse()
.In my mind, the algorithm looks something like this:
getTokenOrCommentBefore()
in a loop, adding each comment to an array. When a token is retrieved instead of a comment, you're done. getTokenOrCommentAfter()
in a loop, adding each comment to an array. When a token is retrieved instead of a comment, you're done.WeakMap
so multiple calls to getComments()
don't have to do the search again.I think that should get you most of the way there.
Thanks @nzakas, let's see how it goes :)
@JamesHenry Are you planning on working on this? Sounds like you are, but I have time this weekend to take a look at this so wanted to confirm :)
Well if you don't mind then I would love to take you up on that offer. I have about 7 other things I could be doing to keep the typescript-eslint-parser project motoring, and working on the comment attachment would naturally come at the expense of (some of) those.
If you don't get chance to finish or want a second pair of eyes on something please let me know, and thanks a lot for offering!
Can't escodegen.attachComments be used here?
@ariya Had a little time to look into this this weekend. I tried using estraverse.attachComments
- since it looks like that's what escodegen.attachComments
calls - and got different results from the current implementation. Can definitely look into it more though!
Starting in on @nzakas's suggested path forward, I have the method itself working - however, have a question about a specific case. Right now, we attach the comment in the following example as a trailing comment to the ObjectExpression
node:
function a() {
var b = {
// comment
};
return b;
}
This behavior seems wrong to me, though. I think the comment shouldn't be attached, since it's not leading or trailing the node, and is instead contained by it. On the flip side, I would expect the comment to be attached to the Property
node inside the ObjectExpression
node as a leading comment in the following:
function a() {
var b = {
// comment
c: true
};
return b;
}
We don't attach comments that are inside nodes when the node isn't empty - not sure why we it makes sense to do it when it is. This could also be a bug in comment attachment that we just haven't encountered. Thoughts on this?
@ariya We can't use escodegen.attachComments
because the attachment is different. That's part of why we added comment attachment into Espree directly (also, doing it in Espree as much faster).
@kaicataldo that's an interesting case. I agree, it seems wrong to attach that comment as a trailing comment to the object expression, but on the other hand, where would we attach the comment otherwise? So it's a bug, but I think a useful one because without it, we would miss the comment in our traversal altogether (without some magic).
Do you have a suggestion for how we could solve this problem instead?
We actually do the opposite if a parsed file only contains a comment and attach it as a leading comment to the program node (as opposed to a trailing comment in the example in my comment above). Is there any value to adding a new kind of comment key for cases where a node only contains a comment (what about a comments
key? We could then just leave an empty file's program level key as comments
as well)? At the very least, I think we should be consistent between this case and the empty file case and attach both as a leading or trailing comment.
Thanks for discussing!
The empty-file-except-comment case is a special case, so I'm not sure we can look to that as a good example of what to do here.
Maybe it's worth having the concept of "child comments" when there are no child nodes except for comments? That would mean getComments()
would return an object like:
{
leading: [],
children: [],
trailing: []
}
I'll have to figure out how to check the inside of nodes for comments, but I like that idea! Just to confirm, this is only when there is a comment but no child nodes to attach to, correct?
Awesome progress, guys! Thanks for working on this @kaicataldo
@kaicataldo that is correct. What I would check:
BlockStatement.body
is empty (zero-length array)ClassBody.body
is empty (zero-length array)ObjectExpression.properties
is empty (zero-length array)In those situations, then you should search into the syntax to see if there's a comment.
Edit: I think I found an alternate solution. It's very much a WIP (need to refactor, port over the Espree tests, and actually turn comment attachment off when parsing with Espree), but for anyone interested in/willing to give feedback on the algorithm itself: https://github.com/eslint/eslint/tree/commentattachment
Thanks! I have another case I wanted to discuss, after working on this a bit more tonight.
When using the strategy we've discussed above (calling getTokenOrCommentBefore()
and getTokenOrCommentAfter()
in a loop and stopping when comments are no longer found), the following examples produce the following outputs:
var zzz /*aaa*/ = 777;
This example attaches /*aaa*/
to the Identifier zzz
as a trailing comment.
var zzz = /*aaa*/ 777;
This example attaches /*aaa*/
to the Literal 777
as a leading comment. Both of these behaviors make sense to me, however, it doesn't match the current behavior of Espree, as shown in the following example:
var zzz /*aaa*/ = 777;
Espree currently attaches the comment /*aaa*/
as both a trailing comment to the Identifier zzz
and as a leading comment to the Literal 777
, which means that we have to allow for continuing past certain punctuation tokens while calling getTokenOrCommentBefore()
and getTokenOrCommentAfter()
in a loop.
I initially tried to just allow continuing over punctuation and checking the location of tokens, but that doesn't work with the following scenario:
function a() {
var b = {
// comment
};
return b;
}
In this case, the comment would be attached to the Identifier b
, when it should only be a child comment of the object expression.
I got the current (and some additional) tests passing with what is essentially a rudimentary (and currently incomplete) operator whitelist, but was wondering if anyone else had ideas for a better way. Thanks for all your input!
@kaicataldo can you provide a link to the changes or open a WIP PR? That makes it a lot easier for others to know where to look.
Oh yeah, of course - will hopefully be able to pick this up and finish it in the next few days again - https://github.com/eslint/eslint/commit/97af7d512653b231db602403cc1d590f48de87b2
Left some comments - overall looks good, just a few questions.
I was able to look at this a little more tonight - feeling unsure of what the correct behavior is for comment attachment. The case I outlined above is a good example (copied and pasted here):
When using the strategy we've discussed above (calling getTokenOrCommentBefore()
and getTokenOrCommentAfter()
in a loop and stopping when comments are no longer found), the following examples produce the following outputs:
var zzz /*aaa*/ = 777;
This example attaches /*aaa*/
to the Identifier zzz
as a trailing comment.
var zzz = /*aaa*/ 777;
This example attaches /*aaa*/
to the Literal 777
as a leading comment. Both of these behaviors make sense to me, however, it doesn't match the current behavior of Espree, as shown in the following example:
var zzz /*aaa*/ = 777;
Espree currently attaches the comment /*aaa*/
as both a trailing comment to the Identifier zzz
and as a leading comment to the Literal 777
, which means that we have to allow for continuing past certain punctuator tokens while calling getTokenOrCommentBefore()
and getTokenOrCommentAfter()
in a loop.
Do we only want to attach comments up until a non-comment token is found, both before and after the node? Or do we want to skip over certain punctuator tokens (such as the =
, ===
, and ||
in a VariableDeclaration, BinaryExpression and LogicalExpression, respectively)? Also should comments only be attached to the top-level node in a given location? i.e. In the following example, should a trailing comment be attached to the VariableDeclaration node but not be attached to the Literal or VariableDeclarator nodes, despite the comment being the next token after both children nodes:
//foo
var zzz /*aaa*/ = 777
//bar
I know it was mentioned here that the expected behavior is in the Espree tests, but the current behavior still seems buggy to me and I'm having difficulty parsing what the intended behavior is from potential bugs in how we currently attach comments (example: removing the semicolon at the end of this line leads to //foo
being attached as a leading comment to the Literal 777
, but not to its parent VariableDeclarator).
Retrieving comments by iterating through tokens seems easier to reason about and I think yields a more accurate result than how Espree currently does it (collecting comments as it parses and then attaching them as it finishes each node in the tree). This also appears to be how JSCS handles calculating the location of comments, if I understand this comment correctly. I guess what I'm ultimately trying to get at is a discussion of whether the current behavior is what we want to replicate as closely as possible (which I think would be difficult, given what I've outlined above), or if thinking about the location of comments via token list versus attached to nodes at the parser level frees us from some of the inconsistent/odd behavior outlined above and allows us to simplify things. Since this is already a breaking change, it seems like an opportunity to make things better! I really like the the algorithm @nzakas suggested here with the addition of the concept of children comments.
@nzakas Curious what you think of this - it sounds like your intent was to keep the same behavior we currently have. Hopefully this doesn't take us too far away from the original intent of the issue. Also happy to add this as a discussion point for the next TSC meeting, if you'd prefer a discussion in that setting.
Sorry for the wall of text!
@eslint/eslint-team Any thoughts on my comment above?
@kaicataldo my biggest concern about going too far from what Espree does is how that will affect custom rules. I like the new behavior you describe, as I think it makes more sense, but I'd like to see what sort of trouble that would get us into with existing custom rules. Maybe check against the react and import plugins to verify?
BTW, when posting a wall of text, its helpful to call out specific questions in bold and/or provide a tl;dr so it's easier to scan. I know some days I see a wall of text and just don't bother because I don't have the energy to focus and try to pick out the pieces that need a response.
Consequently, if I missed a question or two, feel free to let me know. :)
Yeah, sorry about that - fair points about the wall of text, and appreciate you reading through it all. I'll do some exploration into this and report back!
Having worked on this some more, I'm feeling more and more like we should try to get the same result from attaching comments in ESLint as we did in Espree. Even though my suggestion above might make more sense from an expected behavior standpoint, I don't think the benefits outweigh the risk of breaking plugins in the ecosystem.
I'm going to try and get it as close as I can to the current location of attached comments in Espree - might be a little tricky, because the way in which it's done in Espree is a totally different strategy (collecting comments and attaching them during parse vs iterating over token lists post-parse).
This is a bit out there, but... Could we support multiple comment attachment "strategies" and let users basically pick one?
On Oct 15, 2016 11:08 PM, "Kai Cataldo" notifications@github.com wrote:
Having worked on this some more, I'm feeling more and more like we should try to get the same result from attaching comments in ESLint as we did in Espree. Even though my suggestion above might make more sense, I don't think benefits outweigh the risk of breaking plugins in the ecosystem. I'm going to try and get it as close as I can to the current location of attached comments in Espree - might be a little tricky, because the way in which it's done in Espree is a totally different strategy.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/eslint/eslint/issues/6724#issuecomment-254026277, or mute the thread https://github.com/notifications/unsubscribe-auth/AARWeqErqDOybGrPAXrFa-aVE6p2cE3Aks5q0aNKgaJpZM4JRHUD .
One of the things I think JSCS did better than ESLint was how it treated comments. If we're going to be moving comments into ESLint, and we're at least partially doing that to take the responsibility away from parsers, now seems like a good time to at least deprecate trailingComments and leadingComments?
Getting them out of the AST also means we can accept any ESTree AST as long as it also provides an array of commments.
One way to do this deprecation is to only lazily generate leading and trailing comments by turning those properties into getter functions. We could provide a deprecation warning when those accessors get hit. Getters are typically slow, so I think speed measurements would be important here.
@mikesherov Just to be clear, the proposal in this issue is turn off comment attachment and to move all this logic into sourceCode.getComments()
which, when given a node, returns the comments for that node. The general strategy I've been looking into is outlined in this comment. As you can see from this giant wall of text, there are some challenges trying to recreate the exact same result we currently get with Espree, and I was hoping to simplify it.
It looks like the corresponding method in JSCS would only return comments that are either the current token or follow the current token, which, as you mentioned, is very different from the current strategy of attaching comments in Espree.
Could you speak more to what you think the benefits of doing it the way JSCS does are?
@kaicataldo, I think JSCS's way always acknowledged the lossy nature of the AST, and never attempted to attach comments to nodes.
You're starting to hit those limitations in doing your work, like you pointed out with var a /**/ = zzz;
, but it's not about skipping punctuators, it's about the fact that individual AST nodes often contain more than one piece of syntax, and so the paradigm of leading
, trailing
, and even children
is intellectually muddy. Consider:
/*1*/ class /*2*/ a /*3*/ extends /*4*/ b /*5*/ { /**/ } /**/ ; /**/
child
comments, isn't it a bit inconsistent for that child comment to longer be a child once a class property is created to attach it as "leading" to?Thoughts?
@platinumazure no, users will have no idea how this will affect ESLint. This isn't something we should push to the users.
@mikesherov part of our complexity is that we have always traversed into comments just like we do other nodes, so you can say LineComment
and BlockComment
as visitors just like FunctionDeclaration
. If we get rid of the concept of comments belonging to nodes completely, then that pattern breaks. That's the main impetus for keeping some concept of leading/trailing vs. moving to something more like was JSCS does (which I have no philosophical concerns about).
So the larger question is if we go the way of JSCS comment handling, how do we deal with the visitor pattern we've always supported for comments in rules? Is this a big enough improvement that we remove this capability and accept that this will cause pain for people with custom rules? Or do we find some way to still support this pattern?
@nzakas, thanks for the background info. I'm going to speak to @kaicataldo in person about this tomorrow... we work together :-)!
I'd like to understand some of the rules that use comment visitors so I can understand how tying them back to nodes is useful. My naive take is that we'd still be able to "visit" comments, but we'd then use ranges to figure out what syntax they relate to instead of traversal.
@mikesherov ah nice, totally forgot you guys worked together. :)
There are a bunch of rules in core that use this functionality, if you search for "LineComment", you should find them. I did a cursory look, and it does look like many of those rules would be just as good using Program.comments
and iterating over those instead, so maybe it's not that a big of a deal. Of course, we don't really know what people are doing in the wild.
I'm continuing to slowly but surely work on this. After talking this over with @mikesherov, I'm not sure the concept of children
comments really holds up. I think it's best if we start by trying to mimic the current behavior (while eliminating some bugs I've found in the process)/follow @nzakas's initial recommendation, and once we get that all working, we can evaluate and see if we want to make changes on top of that.
Here's a branch with the comment getting happening in ESLint rather than Espree with all the tests for util/source-code.js
passing.
@kaicataldo thanks for the update. Did you and @mikesherov talk about how JSCS handled comments and if it could make sense for us to just not traverse into comments and rely on getCommentBefore()
in rules instead?
@nzakas, we talked over the fact that the current leading and trailing behavior is currently pretty buggy, and that pursuing the simpler strategy might lead to breaks in userland that rely on the buggy behavior.
That's why @kaicataldo is breaking it into two parts. First, we're doing what the original intent of this issue is: get rid of leading/trailing Comments attach to nodes.
Once that's been clarified a bit, we intend to use bigQuery to sample how userland uses these properties/functions, and will survey popular plugins so we have a bit of data to guide us on next steps that may be more drastic.
Progress! I have a complete PR (mimics the Espree attachment strategy, all tests passing) open here: https://github.com/eslint/eslint/pull/7516
Problem
Right now, leading and trailing comments are calculated in Espree and passed to ESLint. ESLint depends on these comments from things like configuration and determining if someone has opted-out of a rule (such as providing a comment in an empty block). That works well when using Espree, but if you're using a different parser, it forces that parser to reimplement the same comment attachment strategy as Espree, which is a lot of work and not very easy. Case in point, we're working on
typescript-eslint-parser
and are a bit stuck on getting the same leading/trailing comment information from Espree.Solution
If we didn't require parsers to provide leading/trailing comments, and instead calculated them inside of ESLint, we dramatically decrease the complexity of creating ESLint-compatible parsers. We could just require parsers to pass in the
comments
array, and we could figure out where the comments belong on-demand when a rule callscontext.getComments()
. This is probably a bit tricky, as we'd want to ensure we get the same results that Espree gives us right now, but ultimately, I think this would help the parser ecosystem tremendously and, as a side benefit, normalize ESLint behavior across parsers.