[RFC] Constant/method indexing and go to definition

paracycle commented 1 year ago

Motivation

This is a quick-and-dirty, first-cut implementation of constant/method definition indexing based on the prototype code supplied by @kddnewton. An example "Go to definition" request implementation that only works for fully-qualified constants is also included.

Implementation

The indexer is based on scanning the compiled instruction sequences that correspond to all project Ruby files (all files under the folder that contains the Gemfile). This technique was suggested by @kddnewton and can compile and process all files in Core in about 30s on my M1 Pro machine.

In its current form, the indexer runs serially before the Ruby LSP starts, which is obviously not the best way to do this, but is fast enough for this RFC.

Automated Tests

None

Manual Tests

None

kddnewton commented 1 year ago

Note that one limitation of this approach is that it will only run on CRuby at the moment. (Unless you use Syntax Tree to get the instruction sequences through the syntax tree on other VMs.)

Abson commented 1 year ago

How long to make this feature finish? I can't wait to see it. Hope it is better than solargraph in vscode.

kddnewton commented 1 year ago

@morriar this is not dissimilar to doing ctags manually. I'd be fine with standardizing on that format however for the indexing operation. Do you have a tool that generates good ctags for Ruby at the moment?

Morriar commented 1 year ago

I've been experimenting with https://github.com/universal-ctags/ctags lately. Not sure if this is the best one or how it compares to parsing iseqs but it's pretty fast and reliable.

vinistock commented 1 year ago

Would ctags give us enough flexibility to implement hover, signature help, workspace symbols and other features we'd like to explore?

numbcoder commented 1 year ago

@Morriar this is not dissimilar to doing ctags manually. I'd be fine with standardizing on that format however for the indexing operation. Do you have a tool that generates good ctags for Ruby at the moment?

ripper-tags generates ctags using the built-in Ruby parser Ripper, which is more accurate than universal-ctags.

kddnewton commented 1 year ago

If we're going to generate with ripper we may as well generate with Syntax Tree because that's going to get us closer to YARP. I took at look at the CTags generated by universal ctags and it's not at all dissimilar from what Syntax Tree is currently generating. I think we could make the jump to just generating ctags if that makes it easier to work with.

paracycle commented 1 year ago

I am not sure why we are considering alternatives here. The iseq based indexing is now encapsulated inside SyntaxTree with a fallback to doing AST based indexing, we could just start building on that. We know it is fast, it is now encapsulated and stable, so I don't see a reason to not use it. Moreover, since the encapsulation is at the ST layer, it will be easier to plug it out to use YARP behind the scenes if we find that to be a better approach later.

As for what the indexing format is going to be, let's discuss that, but I think we're in a pretty good place with how we are going to do the indexing.

Morriar commented 1 year ago

I am not sure why we are considering alternatives here.

The PR title is prefixed with [RFC] and it's been in draft for the last two months.

As for Ctags, regardless of the indexing format, it seems to be an approach already used by many, in Ruby as well as other languages. It allows a good control over which files should be considered or not, even outside of Ruby and which contents from these files should be indexed. This would also allow people to plug in what they already know and use.

I'm asking if we considered the pros and cons of the proposed approach compared to alternatives.

kddnewton commented 1 year ago

Okay @morriar - I spent the weekend looking at CTags mostly for this ticket. Long story short, Syntax Tree can now generate CTags exactly as ripper-tags can once I merge this PR: https://github.com/ruby-syntax-tree/syntax_tree/pull/334.

Here are the things that I learned:

CTags don't inherently have any support for delineating between identifiers. Extended CTags provides a way to "tag" identifiers with different information, but it wouldn't be a very good format for accessing comments, for instance.
Even with a CTags database, you would still have to provide the functionality in the language server. It doesn't just give you that functionality for free.
universal-ctags seems to miss a couple of things that we can generate more accurately with Syntax Tree. We're also very slightly faster than both it and ripper-tags.

If you want to roll with CTags, that's fine, you can use that format. I don't really see any particular benefit to using it over just accessing the index directly, but if you feel strongly there's now nothing stopping you.

paracycle commented 1 year ago

The PR title is prefixed with [RFC] and it's been in draft for the last two months.

Totally fair.

As for Ctags, regardless of the indexing format, it seems to be an approach already used by many, in Ruby as well as other languages. It allows a good control over which files should be considered or not, even outside of Ruby and which contents from these files should be indexed. This would also allow people to plug in what they already know and use.

I'm asking if we considered the pros and cons of the proposed approach compared to alternatives.

I think what I was trying to settle in my previous message was the difference between how we will be doing the indexing vs what the indexing format was going to be. I think we should continue discussing what the indexing format might be, and I would lean on a format more under our control, since it might give us more options. On the other hand, I think we can start by doing the indexing itself using the new ST indexer. I think we are all on the same page on that aspect, right?

vinistock commented 1 year ago

I agree that indexing with Syntax Tree is the way to go. It will gives us more flexibility and control which should allow us to provide more complex features.

Concerning the format, since performance is critical for language servers, the answer should be whatever is fastest to search/access. Doesn't Syntax Tree already come with a structure like that? Pretty sure we're covered in both fronts.

Morriar commented 1 year ago

Thanks @kddnewton for this run down. I think there is a misunderstanding about why I brought Ctags though.

I'm not promoting the format, I actually think Ctags is a horrible format for what we're trying to do here. What I'm promoting here is the flexibility of the tools that produce the Ctags.

Ctags generators come with options that let you include / exclude paths, files and or even constructs (let's say private things) and this kind of features seems really important in this context. For example, in codebases containing a lot of generated code may it be real Ruby, RBIs or something else. Imagine if the first result you always jump is the wrong one and brings you in the middle of generated code? Horrible experience.

Parsing generated Ctags means the user can generate them with the options they want. Maybe even having their own custom tools to generate Ctags that support their DSLs etc.

If we're going with SyntaxTree for the indexing, the Ruby LSP is then responsible to provide the options to pass or not certain files to the SyntaxTree, then select or not what SyntaxTree returns to be indexed.

So have we considered what features are necessary to provide a good experience here? Have we looked at what other tools that provide the same scope of features do?

I hope this clears things up.

Abson commented 1 year ago

哥们，实在不行搞个 CTag 用着先，没有提示很难受

gwillcox-r7 commented 1 year ago

Thanks @kddnewton for this run down. I think there is a misunderstanding about why I brought Ctags though.

I'm not promoting the format, I actually think Ctags is a horrible format for what we're trying to do here. What I'm promoting here is the flexibility of the tools that produce the Ctags.

Ctags generators come with options that let you include / exclude paths, files and or even constructs (let's say private things) and this kind of features seems really important in this context. For example, in codebases containing a lot of generated code may it be real Ruby, RBIs or something else. Imagine if the first result you always jump is the wrong one and brings you in the middle of generated code? Horrible experience.

Parsing generated Ctags means the user can generate them with the options they want. Maybe even having their own custom tools to generate Ctags that support their DSLs etc.

If we're going with SyntaxTree for the indexing, the Ruby LSP is then responsible to provide the options to pass or not certain files to the SyntaxTree, then select or not what SyntaxTree returns to be indexed.

So have we considered what features are necessary to provide a good experience here? Have we looked at what other tools that provide the same scope of features do?

I hope this clears things up.

Were you thinking something like Bust-A-Gem from https://github.com/gurgeous/bust-a-gem in terms of implementation here for comparison's sake? That also uses a CTags based approach but uses ripper-tags to implement its features. I have used it quite a bit and haven't personally found any issues though I deal more with pure Ruby than Rails related code so that may need some testing.

searls commented 1 year ago

:wave: nothing to add here other than to say I'm eagerly awaiting definition support landing in ruby-lsp. I'd been contemplating adding some QoL improvements to Bust-A-Gem for a while now but if this might land in ruby-lsp in the near future I should probably hold off

vinistock commented 1 year ago

I'm super happy with everybody's excitement over go to definition and related functionality. Let me chime in here to clarify a few things:

We have plans to work on code indexing, which will unblock go to definition, autocomplete, hover, signature help, workspace symbols and other features
We're planning on working it soonish. It's important to get the implementation right for a good experience - especially in very large projects. Therefore, we will take the time necessary to prototype, explore alternatives and make sure we're delivering the best experience we can
This PR here is not a final implementation, it's one of these prototypes exploring indexing through instruction sequences as opposed to indexing via parsing and visiting (the usual method for analyzing files in the Ruby LSP)
While we might explore CTags as part of the prototypes, it's very unusual for an LSP to use that as its indexing backend. We're most likely going to be indexing files ourselves to give us enough flexibility to implement features - not only the current ones in the specification, but future additions and custom requests we may desire to add. For example, semantic search #338 is not a part of the specification, but it could be implemented as a custom request as long as we have enough information in our index

Shopify / ruby-lsp