Project Scaling - Githubissues

Andersbakken / rtags

A client/server indexer for c/c++/objc[++] with integration for Emacs based on clang.

http://www.rtags.net

GNU General Public License v3.0

1.83k stars 252 forks source link

Project Scaling #719

Closed quicknir closed 7 years ago

quicknir commented 8 years ago

So I've had some mostly positive, yet still mixed experiences working with rtags on a very large C++ project. Auto completion is still a bit hit or miss, and flycheck takes very very long periods of time to tag errors. At some point while I was home I decided to pull the latest rtags and rebuild and play around with the rtags source itself. Within rtags everything works beautifully. I think a lot of this is related to scale. As changes are made and tested, is there a particular codebase that is used as kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how well rtags works there. If this project is large enough to reproduce some of the issues I'm seeing, maybe that could be used to help try to fix these issues if possible.

Andersbakken commented 8 years ago

The completion results are kinda completely generated by clang with little chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works great but I appreciate that some operations likely don't scale as well as they maybe could. I think you guys' project is a lot lot bigger than that. I have an idea I want to test out that might improve the efficiency of looking up symbols by name (rtags-find-symbol) but completions are a little harder to optimize. I don't entirely know why flycheck should take so long though. Are you finding that it's slow even if everything's fully parsed and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com wrote:

So I've had some mostly positive, yet still mixed experiences working with rtags on a very large C++ project. Auto completion is still a bit hit or miss, and flycheck takes very very long periods of time to tag errors. At some point while I was home I decided to pull the latest rtags and rebuild and play around with the rtags source itself. Within rtags everything works beautifully. I think a lot of this is related to scale. As changes are made and tested, is there a particular codebase that is used as kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how well rtags works there. If this project is large enough to reproduce some of the issues I'm seeing, maybe that could be used to help try to fix these issues if possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719, or mute the thread https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk .

Andersbakken commented 8 years ago

The more I think about it the more I don't entirely see why the size of the project should matter for the two operations you mentioned (diagnostics and completions). Let me see if I can put something in that might help debug the situation.

Anders

On Tue, Jun 14, 2016 at 7:47 PM, Anders Bakken agbakken@gmail.com wrote:

The completion results are kinda completely generated by clang with little chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works great but I appreciate that some operations likely don't scale as well as they maybe could. I think you guys' project is a lot lot bigger than that. I have an idea I want to test out that might improve the efficiency of looking up symbols by name (rtags-find-symbol) but completions are a little harder to optimize. I don't entirely know why flycheck should take so long though. Are you finding that it's slow even if everything's fully parsed and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com wrote:

So I've had some mostly positive, yet still mixed experiences working with rtags on a very large C++ project. Auto completion is still a bit hit or miss, and flycheck takes very very long periods of time to tag errors. At some point while I was home I decided to pull the latest rtags and rebuild and play around with the rtags source itself. Within rtags everything works beautifully. I think a lot of this is related to scale. As changes are made and tested, is there a particular codebase that is used as kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how well rtags works there. If this project is large enough to reproduce some of the issues I'm seeing, maybe that could be used to help try to fix these issues if possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719, or mute the thread https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk .

Andersbakken commented 8 years ago

Actually. I have one that I just added.

If you run rdm like this:

rdm --completion-logs

You will see some info about the generation of completions on the c++ side. It would be interesting to get some numbers for this on your big project and compare to numbers for equivalent operations inside a smaller project like rtags.

Anders

On Thu, Jun 16, 2016 at 12:46 AM, Anders Bakken agbakken@gmail.com wrote:

The more I think about it the more I don't entirely see why the size of the project should matter for the two operations you mentioned (diagnostics and completions). Let me see if I can put something in that might help debug the situation.

Anders

On Tue, Jun 14, 2016 at 7:47 PM, Anders Bakken agbakken@gmail.com wrote:

The completion results are kinda completely generated by clang with little chance for influencing them.

I use rtags daily at a codebase of some 1800~ files at work and it works great but I appreciate that some operations likely don't scale as well as they maybe could. I think you guys' project is a lot lot bigger than that. I have an idea I want to test out that might improve the efficiency of looking up symbols by name (rtags-find-symbol) but completions are a little harder to optimize. I don't entirely know why flycheck should take so long though. Are you finding that it's slow even if everything's fully parsed and you touch a cpp file?

Anders

On Sun, Jun 12, 2016 at 7:46 PM, quicknir notifications@github.com wrote:

So I've had some mostly positive, yet still mixed experiences working with rtags on a very large C++ project. Auto completion is still a bit hit or miss, and flycheck takes very very long periods of time to tag errors. At some point while I was home I decided to pull the latest rtags and rebuild and play around with the rtags source itself. Within rtags everything works beautifully. I think a lot of this is related to scale. As changes are made and tested, is there a particular codebase that is used as kind of the standard for pushing forward?

I'm tempted to grab the largest cmake C++ project on github, and see how well rtags works there. If this project is large enough to reproduce some of the issues I'm seeing, maybe that could be used to help try to fix these issues if possible.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719, or mute the thread https://github.com/notifications/unsubscribe/AAEdSgEjOsAnkyBOfMhliVGaT5ww6UYdks5qLMRrgaJpZM4Iz9Qk .

quicknir commented 8 years ago

The size of the project generally ends up mattering because of C++'s terrible "module" system. Basically as you write code, some fraction of the code is in header files. All that code just gets copied and pasted into any other file that requires it.

While a larger project should not be #including more files in each file, when you start thinking about transitive includes, it becomes clear that the larger the project, the more transitive includes you get. So you'd actually expect files in the most dependent layer to scale in size (post pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly the speed of the database is usually excellent, but every once in a while there is a looooong pause. Not sure what causes this.

Andersbakken commented 8 years ago

I get what you're saying about the transitive includes. I happen to work a lot on some of our core tooling headers as well and basically every time I touch them several hundreds of files get dirtied.

Anders

On Thu, Jun 16, 2016 at 7:08 AM, quicknir notifications@github.com wrote:

The size of the project generally ends up mattering because of C++'s terrible "module" system. Basically as you write code, some fraction of the code is in header files. All that code just gets copied and pasted into any other file that requires it.

While a larger project should not be #including more files in each file, when you start thinking about transitive includes, it becomes clear that the larger the project, the more transitive includes you get. So you'd actually expect files in the most dependent layer to scale in size (post pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly the speed of the database is usually excellent, but every once in a while there is a looooong pause. Not sure what causes this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719#issuecomment-226495967, or mute the thread https://github.com/notifications/unsubscribe/AAEdSk8QEZNEkps5OpWFIbq6cX0U_Jhaks5qMVjpgaJpZM4Iz9Qk .

Andersbakken commented 8 years ago

I just improved the completion logs a little btw.

Anders

On Fri, Jun 17, 2016 at 10:57 PM, Anders Bakken agbakken@gmail.com wrote:

I get what you're saying about the transitive includes. I happen to work a lot on some of our core tooling headers as well and basically every time I touch them several hundreds of files get dirtied.

Anders

On Thu, Jun 16, 2016 at 7:08 AM, quicknir notifications@github.com wrote:

The size of the project generally ends up mattering because of C++'s terrible "module" system. Basically as you write code, some fraction of the code is in header files. All that code just gets copied and pasted into any other file that requires it.

While a larger project should not be #including more files in each file, when you start thinking about transitive includes, it becomes clear that the larger the project, the more transitive includes you get. So you'd actually expect files in the most dependent layer to scale in size (post pre-processor) linearly with the project. C++ makes me sad sometimes.

I happen to be working in the most dependent layer of a very large project.

I will add the flag you suggest and see what I come up with. Interestingly the speed of the database is usually excellent, but every once in a while there is a looooong pause. Not sure what causes this.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719#issuecomment-226495967, or mute the thread https://github.com/notifications/unsubscribe/AAEdSk8QEZNEkps5OpWFIbq6cX0U_Jhaks5qMVjpgaJpZM4Iz9Qk .

quicknir commented 8 years ago

So i pulled to latest rtags and rebuilt. Overall it's a bit better behaved now, though sometimes auto completion is very slow still. I get messages like this:

CODE COMPLETION 7.072s reparsing translation unit <redacted>
CODE COMPLETION 16.69s Generated completions for <redacted> successfully in 181 ms

Is this benchmark purely the compilation time for that translation unit? The truth is that seems a bit high even for something at the most dependent layer of a large project. 16 seconds is a lot of compiler cycles.Even 7 seconds seems like a lot. I should try comping it separately and benchmarking it.

Other than that, another issue I'm finding is that because preparing completions can be so slow, and seemingly blocking for the server, it also affects other things. For instance, go to definition which is normally very snappy, is sometimes extremely slow as completions are being prepared in the background and then the server simply doesn't answer for many seconds. Maybe completion prep could be done asynchronously in a thread? Though I realize that introduces considerably complexity.

quicknir commented 8 years ago

I did some benchmarking on my own, and indeed it takes quite a while to rebuild the file. Specifically it takes around 17 seconds to compile my end files, though that is with code generation. Turning off debug symbols reduces this to 14, and adding -fsyntax-only reduced it to 11. Is rtags using -fsyntax-only? As you can see it makes quite a huge difference. I'm guessing you already use this though.

Is there any way to do any kind of caching of header files? Fundamentally I think this is the only way to solve this. How about precompiled headers, or pretokenized headers? http://clang.llvm.org/docs/PTHInternals.html. In principle these would solve the problem.

quicknir commented 8 years ago

Actually, this doesn't seem that terrible: http://clang.llvm.org/docs/PCHInternals.html.

Andersbakken commented 8 years ago

We actually do have some pch-code though it's kinda experimental and I can't remember to what degree it works.

You can start rdm with --pch-enabled

We've tried and failed at trying to auto-generate sensible pch headers in the past but I think if your project already has pch headers and you enable that switch it might have a chance at working.

The times for generating the translation unit does seem long. Due to how completion works in libclang you also have to reparse the translation unit after creating it and it seems like between the two operations it took the whole 16 seconds. Once that is done (and we try to do it preemptively when you switch buffers and keep it in the cache etc) completions should be reasonably snappy though (like the 180ms in your log).

Anders

On Wed, Jun 29, 2016 at 12:47 PM, quicknir notifications@github.com wrote:

Actually, this doesn't seem that terrible: http://clang.llvm.org/docs/PCHInternals.html.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719#issuecomment-229467412, or mute the thread https://github.com/notifications/unsubscribe/AAEdSikqHp4gahJ_zPzRBOb5OB0gl70oks5qQsvpgaJpZM4Iz9Qk .

quicknir commented 8 years ago

Is there any way to disable this preemptive caching? I find for me it has relatively few hits, and on the other hand what happens is that it's often caching when I want it to go to definition; in this case I now get as long a pause for goto def as I do for auto completion, whereas normally its instant.

I'll try to look into pch and see what the deal is. The problem though is that ultimately for this to work well, rtags itself has to generate the pch's. Otherwise if you modify a file in your project that has precompiled headers, and then switch to another file, you could be using the stale copy of the first.

quicknir commented 8 years ago

I recently came across this link: http://stackoverflow.com/questions/26989374/faster-code-completion-with-clang and now I'm more confused. It seems like there is a specialized precompiled preamble available for parsing translation units in libclang. If that's the case, then I can't understand why auto completion is taking so long? My source file is only about 200 lines, all of the benchmarks I did were raw compilations without using any precompiled preambles. Surely with a precompiled preamble this should be lightning fast?

quicknir commented 8 years ago

Another update: I finally decided to give you complete me (daemon) (ycmd) a shot on emacs after putting it off for a while. Indeed, based on its performance characteristics, it seems to indeed use this precompiled preamble trick. When you first open a file, there's a decent (async) pause, about 5-10 seconds during which you can't use autocomplete and don't get flycheck errors. After that though, everything happens pretty instantly. Auto completion and flycheck are under a second response time, sometimes basically instant. If you add/remove an include, or if you modify one of the includes and then save the file, it will detect that the preamble has changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is fantastic for making extended edits to a single file. This approach isn't strictly as good as using precompiled headers, but it's far less work for the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and rtags for code navigation as it has many features that ycmd does not offer (ycmd does not keep a database at all). The combination is working well but obviously it's more effort as an end user, as you have to setup both, deal with minor quirks of both of various natures (for ycmd it took me about a full day to get everything working perfectly, my friends' rtags setup that I copied had a similar experience getting rtags to work perfectly with our build system).

Anyhow it's just a thought I guess, that it might be possible with a reasonable dev effort to get the same performance as ycmd, as its just using built in libclang features.

Andersbakken commented 8 years ago

There are almost certainly some bugs in the current rtags auto-completion code. I'll try to dig in to figure out what we're doing wrong but it's a fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd) a shot on emacs after putting it off for a while. Indeed, based on its performance characteristics, it seems to indeed use this precompiled preamble trick. When you first open a file, there's a decent (async) pause, about 5-10 seconds during which you can't use autocomplete and don't get flycheck errors. After that though, everything happens pretty instantly. Auto completion and flycheck are under a second response time, sometimes basically instant. If you add/remove an include, or if you modify one of the includes and then save the file, it will detect that the preamble has changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is fantastic for making extended edits to a single file. This approach isn't strictly as good as using precompiled headers, but it's far less work for the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and rtags for code navigation as it has many features that ycmd does not offer (ycmd does not keep a database at all). The combination is working well but obviously it's more effort as an end user, as you have to setup both, deal with minor quirks of both of various natures (for ycmd it took me about a full day to get everything working perfectly, my friends' rtags setup that I copied had a similar experience getting rtags to work perfectly with our build system).

Anyhow it's just a thought I guess, that it might be possible with a reasonable dev effort to get the same performance as ycmd, as its just using built in libclang features.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719#issuecomment-230932629, or mute the thread https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk .

Andersbakken commented 8 years ago

I've refactored and simplified the completion code a little bit. I think it's a little bit faster now. I did see that ycm uses the relatively recently added CXTranslationUnit_CreatePreambleOnFirstParse flag so I added that too but I find that I still have to do a reparse before the preamble is generated and as such you won't be able to use completions for several seconds after opening up a new buffer.

Once it's finished with the reparse I seem to be getting quite snappy completions though. Can you give it a shot and see it there's any improvements in your project?

Anders

On Thu, Jul 7, 2016 at 10:27 AM, Anders Bakken agbakken@gmail.com wrote:

There are almost certainly some bugs in the current rtags auto-completion code. I'll try to dig in to figure out what we're doing wrong but it's a fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd) a shot on emacs after putting it off for a while. Indeed, based on its performance characteristics, it seems to indeed use this precompiled preamble trick. When you first open a file, there's a decent (async) pause, about 5-10 seconds during which you can't use autocomplete and don't get flycheck errors. After that though, everything happens pretty instantly. Auto completion and flycheck are under a second response time, sometimes basically instant. If you add/remove an include, or if you modify one of the includes and then save the file, it will detect that the preamble has changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is fantastic for making extended edits to a single file. This approach isn't strictly as good as using precompiled headers, but it's far less work for the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and rtags for code navigation as it has many features that ycmd does not offer (ycmd does not keep a database at all). The combination is working well but obviously it's more effort as an end user, as you have to setup both, deal with minor quirks of both of various natures (for ycmd it took me about a full day to get everything working perfectly, my friends' rtags setup that I copied had a similar experience getting rtags to work perfectly with our build system).

Anyhow it's just a thought I guess, that it might be possible with a reasonable dev effort to get the same performance as ycmd, as its just using built in libclang features.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719#issuecomment-230932629, or mute the thread https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk .

Andersbakken commented 8 years ago

I made another small change just now that makes completions work again completions that occur with :: -> . etc.

Anders

On Fri, Jul 8, 2016 at 3:11 PM, Anders Bakken agbakken@gmail.com wrote:

I've refactored and simplified the completion code a little bit. I think it's a little bit faster now. I did see that ycm uses the relatively recently added CXTranslationUnit_CreatePreambleOnFirstParse flag so I added that too but I find that I still have to do a reparse before the preamble is generated and as such you won't be able to use completions for several seconds after opening up a new buffer.

Once it's finished with the reparse I seem to be getting quite snappy completions though. Can you give it a shot and see it there's any improvements in your project?

Anders

On Thu, Jul 7, 2016 at 10:27 AM, Anders Bakken agbakken@gmail.com wrote:

There are almost certainly some bugs in the current rtags auto-completion code. I'll try to dig in to figure out what we're doing wrong but it's a fair bit of work.

Anders

On Wed, Jul 6, 2016 at 3:58 PM, quicknir notifications@github.com wrote:

Another update: I finally decided to give you complete me (daemon) (ycmd) a shot on emacs after putting it off for a while. Indeed, based on its performance characteristics, it seems to indeed use this precompiled preamble trick. When you first open a file, there's a decent (async) pause, about 5-10 seconds during which you can't use autocomplete and don't get flycheck errors. After that though, everything happens pretty instantly. Auto completion and flycheck are under a second response time, sometimes basically instant. If you add/remove an include, or if you modify one of the includes and then save the file, it will detect that the preamble has changed and it will recompile it, giving you that 5-10 seconds again.

In practice, the 5-10 second delay rarely comes up; this system is fantastic for making extended edits to a single file. This approach isn't strictly as good as using precompiled headers, but it's far less work for the user and for the maintainer.

Right now my approach is to use ycmd for autocompletion/flycheck, and rtags for code navigation as it has many features that ycmd does not offer (ycmd does not keep a database at all). The combination is working well but obviously it's more effort as an end user, as you have to setup both, deal with minor quirks of both of various natures (for ycmd it took me about a full day to get everything working perfectly, my friends' rtags setup that I copied had a similar experience getting rtags to work perfectly with our build system).

Anyhow it's just a thought I guess, that it might be possible with a reasonable dev effort to get the same performance as ycmd, as its just using built in libclang features.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Andersbakken/rtags/issues/719#issuecomment-230932629, or mute the thread https://github.com/notifications/unsubscribe/AAEdSlaDsHnVp-r3YoXNPcZdD2mjdgRqks5qTDMBgaJpZM4Iz9Qk .

quicknir commented 8 years ago

Indeed, the improvement is dramatic! Very very nice! flycheck however still seems to be running at a similar speed to before. There are some other minor issues with auto completion (showing private members for instance) but I'll open separate tickets for those.