Feature overlay with PHP Integrator

felixfbecker commented 7 years ago

@Gert-dev let's move this discussion here so it doesn't clutter the discussion for completion.

Hi, @felixfbecker, great to hear that someone else is also looking to develop FOSS indexing and IDE-like functionalities for PHP. It seems we are both developing a similar project but for different editors; I maintain the PHP integrator packages for Atom(.io), which also provide things such as autocompletion and code navigation, much like your package does for Visual Studio Code.

I've already went fairly far in implementing autocompletion and code navigation in my packages and already have most of the functionality you're looking for in this ticket in place (local variable scanning, built-in class fetching, docblock analysis and inheritance, ...). My work can be found here [1]. (I'm explicitly linking to the development version as it went through some major refactoring.)

I certainly don't want to keep you from writing your own indexer if that's what you desire, but just wanted to let you know that there is a similar project out there. I've only recently split off my PHP indexer (which is also based on php-parser, so I'm very glad to see you got nikic to improve the lexer's robustness when it encounters syntax errors!) from my atom-base package into a separate project and repository, hence the apparant low number of commits and activity.

[1] https://github.com/php-integrator/core/tree/development

felixfbecker commented 7 years ago

@Gert-dev I saw that you use SQLite. I am interested in how your index structure looks like? Currently this LS just uses a simple serialize() to cache the index on disk. Also interested in how you leveraged the PHP parser for completion, and how you resolve expression types recursively (if you do).

Also, how does your Atom package communicate with your PHP code? How do you handle async communication? As Atom packages are written in CoffeeScript, and the analysis is done in PHP, I imagine you must have invented some kind of protocol to communicate? Which leads me to my final question: What do you think about using the standardized language server protocol instead - e.g. folding your code into this server? Eclipse Che, VS Code, VS and many more use/will use LSP

ghost commented 7 years ago

Hi @felixfbecker

I am interested in how your index structure looks like?

It currently consists of a couple of simple SQL tables. I also leverage serializing a bit for things such as function parameters as I do not need access to this data in a further normalized form (i.e. in separate tables) and to avoid a performance hit when joining them.

Also interested in how you leveraged the PHP parser for completion

I've thought of letting autocompletion results be determined on the PHP side, but as you've probably already found out, php-parser does not deal particularly well with situations where code is not yet complete, such as a user typing $this->. For this reason autocompletion is currently completely handled on the CoffeeScript side.

What happens in CoffeeScript is that data such as available classes are fetched from the PHP side and loaded as suggestions into autocomplete-plus, which is an Atom package that you basically feed a list of relevant suggestions and it fuzzy matches based on what the user typed.

After I already had that code, there was a point where I wanted to move "Invocation info fetching" (i.e. simply the fetching of the function or method that is being invoked in a statement such as foo(1, 2, 3) when the cursor is between the parentheses) to PHP. Initially, I still had a lot of CoffeeScript code doing manual PHP parsing based on regular expressions that I could not move to PHP because of the previously mentioned php-parser problem. I finally found that I could move this to PHP by instead directly working with token_get_all and wrote the PartialParser to handle this.

and how you resolve expression types recursively (if you do).

Resolution of types is handled entirely by the TypeDeducer (A word of warning: this is one of the classes that is still in dire need of some refactoring). This class relies mostly upon data from the index as well as php-parser nodes to analyze the type(s) of expressions. The regular expression part of the code is a bit of a relic from code that was originally also implemented in CoffeeScript, which I later moved to PHP.

What basically happens in CoffeeScript is this:

You have an expression $this->foo()->bar()-> in the editor. The CoffeeScript package wants to know the type(s) of this so it can fetch members of these types to autocomplete later on. It asks the base service for this information which, in turn, asks the PHP core via a --deduce-types call.
The DeduceTypesCommand fetches the incomplete statement via the PartialParser and cuts it into array elements, so the above example becomes ['$this', 'foo()', 'bar()'].
The TypeDeducer takes the array and starts evaluating each of these expressions one by one.

In a later version this array, which is then used to match regular expressions against, would ideally be turned into a single php-parser node so the regular expression matching can disappear entirely.

Also, how does your Atom package communicate with your PHP code? How do you handle async communication? As Atom packages are written in CoffeeScript, and the analysis is done in PHP, I imagine you must have invented some kind of protocol to communicate?

That is correct, the PHP side is currently using a simple JSON-based output format that contains information that looks somewhat like this and depends on the information returned.

The CoffeeScript side simply spawns the required PHP processes with the correct arguments to retrieve the information it needs. Coincidentally, I had previously thought about using socket communication between PHP and CoffeeScript instead of process spawning, but I've since then not been convinced that this would improve performance, besides adding the following problems:

I have to manage starting and closing the PHP server and the socket connections manually myself. There can be multiple projects active at the same time, so I'd need to ensure they can automatically use a different port if the default one is taken. Furthermore, if Atom were to crash, I'm not sure I can clean up the PHP server process automatically.
Process spawning is in a way automatically parallellized by the operating system, so if I happen to spawn multiple async PHP processes simultaneously on a multicore system, this gives me a bit of free concurrency. With sockets this would mean I'd need to start and manage either multiple PHP servers and socket connections per project or introduce some sort of load balancer proxy for multiple instances.

What do you think about using the standardized language server protocol instead - e.g. folding your code into this server? Eclipse Che, VS Code, VS and many more use/will use LSP

I'm certainly not opposed to the idea and I think having a unified standard is a good idea. However, from what I've understood from the LSP is that the server itself becomes responsible for managing text documents and such. That sounds ok, but all of this is currently handled by Atom and Atom has no notion of LSP. It would imply that I would need to build my own bridge between Atom and the server according to the LSP protocol. I'm interested in this, but I have to be careful where I allocate my time as I'm already running into issues where I want to implement feature A but notice I have to fix bug B or refactor part P first (for example, I've been having my eye on implementing PhpStorm meta file support for several months now but still have some open issues and refactoring to attend to first).

In short, as much as I'd like to, I can't currently commit to collaborating in a way that would be fair to you. However, if at some point you'd like to utilize code from my indexer or core or just would like to know how I implemented something (if I have it already, of course ;-)), you are more than welcome and I will try to help the best I can!

kaloyan-raev commented 7 years ago

It would imply that I would need to build my own bridge between Atom and the server according to the LSP protocol.

It seems that such bridge has already been created: https://github.com/OmniSharp/atom-languageclient

felixfbecker commented 7 years ago

@Gert-dev Thanks for your write-up. From your description I take that PHP Integrator isn't a good fit at the moment to integrate into this project, because the architecture is different and the parts that are not yet in the LS are the ones implemented though a custom parser in CoffeeScript.

I also wanted to comment on spawning processes vs IPC. Spawning processes has a large overhead on Windows. The benefit of IPC is also that we can keep the index in memory and don't have to read from the database on disk everytime, which is much faster (in fact we don't use a database at all). A socket is not mandatory, on Linux/Mac we just talk over STDIO. On Windows we start a TCP server, let the OS choose a random port, spawn the LS and pass it the socket address to connect to. As it is a child process it will exit when the parent process exits. We handle async code by using an event loop in PHP (which you are probably familiar with if you worked with CoffeeScript/NodeJS). We haven't yet run into performance problems that would justify spawning extra processes for parsing for example, but I've thought about it.

I appreciate your offer for help! I would be very interested in your opinions if you just want to subscribe to this repo and give your 2ct every once in a while :)

ghost commented 7 years ago

@felixfbecker No problem! Thanks for the tips about process spawning, I was not aware it was such a hit on Windows. Keeping the database in memory would certainly simplify things on my end as I'm currently reading and writing from disk (but I do have a file cache for slower retrievals that is locally mounted on a tmpfs to improve performance, the latter unfortunately requires manual intervention by the user).

After reading your post, I may revisit socket usage over process spawning in the future (and perhaps a later stage, such as version 2.0, look into becoming compatible to LSP as well).

ADmad commented 7 years ago

@felixfbecker @Gert-dev It's great to see you two sharing ideas and trying to identify possibilities of code reuse between your projects/plugins.

Looking forward to vscode having codeintel features similar to those provided by @Gert-dev's plugins for atom :smile:

felixfbecker / php-language-server

Feature overlay with PHP Integrator #108