Multiple PHPStan runs happening in parallel

berniedurfee-renaissance commented 5 months ago

I have a similar issue to this (https://github.com/swordev/phpstan-vscode/issues/11) when using this plugin.

I see many runs of PHPStan happening at the same time. Not sure if it's the same issue, but I don't see this issue on the command line.

I have max core usage for PHPStan set to 2 in my config:

parallel:
    maximumNumberOfProcesses: 2

Using PHPStan v1.11.1

vendor/bin/phpstan --version
PHPStan - PHP Static Analysis Tool 1.11.1

momala454 commented 2 months ago

Parallel analysis #

PHPStan now runs in multiple processes by default. The list of files to analyse is divided into small chunks which get processed by the same number of processes as the number of CPU cores you have. It works on all operating systems and doesn’t require any special PHP extension. And it’s enabled by default.

What if it's just phpstan launching all those process by itself, and there is no problem there 😅

nikuscs commented 2 months ago

I guess it could also depend if you have phpstan working with parellele processes, the only issue i believe is really the plugin no terminating the processes after certain actions, i guess a "hard" approach to pkill all ongoing processes before spawning new ones could be an ideia, probably a aggro approach :p

Sectimus commented 2 months ago

i guess a "hard" approach to pkill all ongoing processes before spawning new ones could be an ideia, probably a aggro approach

I wouldn't say that is aggro at all. Honestly I don't think there is ever an instance where I would want more than one process at any time. I would love for the feature to use multiple config files (not order of precedence), then it would make sense for multiple processes. (phpstan1.neon processes example/src/model, example/src/entity directories, phpstan2.neon then could do another set of directories with different scan rules defined in the neon config processing like example/test/unit example/test/integration etc.

The way it works right now it will always be scanning the same files regardless, perhaps a better approach would be to have a phpstan.maxProcesses vscode setting definition.

Alternatively only allow one scanner if using a phpstan cache, and just nuke it when we need to spin up a new one since it shouldn't take that much longer anyway. We probably don't even care about its results either since we will get more accurate results from the latest scanner.

If running without a cache, perhaps prompt the user if they want to terminate the run and start a new one as you could very well be editing files that phpstan has already scanned, and thus won't identify any new problems within related files if there are any yet to be scanned very likely in runs of large monoliths without a cache.

SanderRonde commented 2 months ago

Parallel analysis #

Ah yeah I kind of assumed that you already knew that that was happening and that the 27 processes were a multiple of the max you'd set (like 3 instances of 9). It's definitely expected for there to be multiple processes, PHPStan does that when you run it from the commandline as well.

I guess it could also depend if you have phpstan working with parellele processes, the only issue i believe is really the plugin no terminating the processes after certain actions.

Exactly, how many processes PHPStan spawns is entirely up to you and is all configurable in the phpstan.neon file. What's up to the extension is ensuring that when you start another check, the existing check isn't also running alongside it.

i guess a "hard" approach to pkill all ongoing processes before spawning new ones could be an ideia, probably a aggro approach :p

Yeah that's close to the approach I'm using now. Now I just store the PIDs and the PIDs of the children and kill those. Using pkill is quite dangerous since I don't know whether there's other similar processes running that shouldn't be killed. For example:

You're also running PHPStan on the commandline
You have multiple VSCode windows open
You've got a shared dev server where multiple people are logged in under the same user and both are working on something using PHPStan.

Honestly I don't think there is ever an instance where I would want more than one process at any time.

You are able to limit the max number of processes in the PHPStan config file quite easily then. Note that all this extension does is call PHPStan like you normally would on the commandline, except it displays the errors nicely and watches for changes. Any configuration as to how many parallel processes you want to run etc should be done in PHPStan's configuration itself, the extension has no control over this.

I would love for the feature to use multiple config files (not order of precedence), then it would make sense for multiple processes. (phpstan1.neon processes example/src/model, example/src/entity directories, phpstan2.neon then could do another set of directories with different scan rules defined in the neon config processing like example/test/unit example/test/integration etc.

(I don't fully understand what you mean so I might be making some wrong assumptions about what you mean, please correct me if I do). Hmm I think what you mean is that you don't want multiple processes, but you do want multiple processes to be able to scan at the same time if those different scans some pre-defined different parts of the code base right? So it's fine for there to be 2 processes as long as one of them is targeting test/ and the other is targeting src/. Or maybe your point is more so that it'd be nice to have different rule sets targeting different parts of the codebase.

I'd recommend opening an issue in the PHPStan repo. This is something that PHPstan itself could/should support and not something this extension can control. Having different rulesets for different parts of the codebase does sound kind of handy (ESLint for example already does support this) and it may already be supported or might be a good feature suggestion. But in any case it's not something for this extension to implement.

The way it works right now it will always be scanning the same files regardless, perhaps a better approach would be to have a phpstan.maxProcesses vscode setting definition.

There's already a config setting for this in the PHPStan config (the one linked above), I think it's much better to use that instead since it then also works when using PHPStan on the commandline.

Alternatively only allow one scanner if using a phpstan cache, and just nuke it when we need to spin up a new one since it shouldn't take that much longer anyway. We probably don't even care about its results either since we will get more accurate results from the latest scanner.

If running without a cache, perhaps prompt the user if they want to terminate the run and start a new one as you could very well be editing files that phpstan has already scanned, and thus won't identify any new problems within related files if there are any yet to be scanned very likely in runs of large monoliths without a cache.

This is a tough one for a few reasons:

There's no way for this extension to know before queueing a new PHPStan scan whether the cache is going to be available or not. Again, PHPStan itself does know this, but the extension (which basically just calls ./vendor/bin/phpstan on the commandline) can not.
If it would know, there's no way to know to what degree there is a cache available. Let's say half of the files have been cached, would you call that cached or not.
There's no way to know before starting a check how much of the existing cache will be invalidated by the changed. A change to BaseClassForEverything.php will take a massive amount of time since it will invalidate the cache. But a change to NewFile.php might only invalidate a single file (itself). How do you differentiate between these two.

I'd again recommend opening an issue in the PHPStan repo itself, maybe the author of PHPStan is able to help you out here. You could consider getting PHPStan Pro, which does file watching as well (like this extension) except it does run from within PHPStan so it does have access to all the needed information (how much of the cache is present, how much will be invalidated, etc). Then you could ask the author of PHPStan (and thus PHPStan Pro) to add support for this.

SanderRonde commented 2 months ago

I'll be closing the issue as fixed now, I think there have been three to-be-fixed issues in this thread:

Scan A is started, then scan B is started and scan A is not terminated, leading to two scans running side-by-side. That appears to have been fixed by martijn-dd. (described by him as "processes piling up", which makes sense)
Same issue as above, except the scan runs inside of Docker and is less easy to kill. I did not get confirmation that this was fixed by anyone, but neither did I get any reports that it's still broken and I did verify myself that it was fixed.
Multiple scans happening in parallel. Which is simply PHPStan using multiple workers for parallel processing. This is not an issue with the extension and is just something you should be able to configure yourself in your phpstan.neon.

If anybody has issues that do not fall within the above three categories (or falls into the first two but isn't fixed), feel free to re-open the issue. But I think the issue has been fixed then.

Sectimus commented 2 months ago

I'm happy with the explanation, but just to clarify a few points:

I think what you mean is that you don't want multiple processes, but you do want multiple processes to be able to scan at the same time if those different scans some pre-defined different parts of the code base right? So it's fine for there to be 2 processes as long as one of them is targeting test/ and the other is targeting src/.

Correct! This could be integrated into the extension to just point different config files for different directories, then the extension can call phpstan with the correct config file based on the file that vscode reports was edited.

Honestly I don't think there is ever an instance where I would want more than one process at any time.

By this I didn't explicitly mean processes directly, but runners that this extension spins up. If PHPStan creates 7 processes, then I save a file, all 7 of those should be erased and a fresh 7 should spin up. If I were to word it better:

~ Honestly I don't think there is ever an instance where I would want more than one extension runner at any time. (except if we were to support scanning different directories)

I'd recommend opening an issue in the PHPStan repo. This is something that PHPstan itself could/should support and not something this extension can control. Having different rulesets for different parts of the codebase does sound kind of handy (ESLint for example already does support this) and it may already be supported or might be a good feature suggestion. But in any case it's not something for this extension to implement.

PHPStan does already support this via the parameters.paths config. If you were wanting to scan multiple paths / config files at the same time I could just call PHPStan multiple times.

phpstan analyse path/to/directory --configuration path/to/phpstan.neon & phpstan analyse path/to/another/directory --configuration path/to/another/phpstan.neon &

This is possible in the CLI of course, but not via this extension (I would have to awkwardly change the phpstan.configFile setting every time I switch between working on src/ vs test/ in my example. and this is exactly what I do, it is quite frustrating

SanderRonde commented 2 months ago

Correct! This could be integrated into the extension to just point different config files for different directories, then the extension can call phpstan with the correct config file based on the file that vscode reports was edited.

Ah yeah that's a fair point. While this would be possible, I do still think it would be better for this to be solved upstream for a few reasons:

It's relatively hard to know under what config file a certain file falls. PHPStan has some non-trivial rules for determining whether a given file is or is not part of a project. I did recently come up with a hack for extracting this data, but it's only available in PHPStan 1.12+ and it only works in a pull-fashion (I need to periodically call the PHPStan binary and ask it which files it would include), which would mean that something trivial like you creating a new file in a directory and wanting it to be scanned does not work. Something like this is much easier to do within the same PHPStan project.
This could potentially become confusing when a given file falls under multiple configs
Let's say that src/ and test/ both import from base/, then base/ will be checked twice
PHPStan invalidates the cache when it is called with a different config file. Unless you are so smart as to set a different tmpDir for every config file (which I'm sure you are but I'm also quite a lot of people will forget) this means that almost every scan will be uncached.
It would be nicer for such a feature to be available on the commandline, and for it to be available to everyone :)

By this I didn't explicitly mean processes directly, but runners that this extension spins up. If PHPStan creates 7 processes, then I save a file, all 7 of those should be erased and a fresh 7 should spin up. If I were to word it better:

~ Honestly I don't think there is ever an instance where I would want more than one extension runner at any time. (except if we were to support scanning different directories)

Ah yeah sorry I misunderstood then. Then we indeed agree that there should always only be one scan running (even if such a scan contains multiple workers). Only the latest scan should be running, the rest should be killed.

PHPStan does already support this via the parameters.paths config. If you were wanting to scan multiple paths / config files at the same time I could just call PHPStan multiple times.

It indeed does sort of support this, but not fully. What I'd see as an ideal implementation is the way ESLint did it, where you are able to determine which rules to apply on a per-file basis. This would solve your issue of needing to switch configs (since you'd only need one config), ensure the extension works perfectly with this as well, while also making it so you'd only need to call PHPStan once on the commandline. Since PHPStan has very much so been created with the idea of "only one config file" (in contrast to for example TypeScript), which the cache being cleared when a different config file is used is one example of, I think an ideal solution is one that keeps it all in one config.

Sectimus commented 2 months ago

PHPStan invalidates the cache when it is called with a different config file. Unless you are so smart as to set a different tmpDir for every config file (which I'm sure you are but I'm also quite a lot of people will forget) this means that almost every scan will be uncached.

This is indeed how mine is set up. So in theory there should be no crosstalk between cache's (and I have never personally experienced such, my cache is never invalidated and I have yet to have a false report due to this cache setup)

There is a method to the madness however, with a very large project, PHPStan eats through your memory faster than you can say segfault. So if you were limited on memory, splitting your configs so that you stay within the limit of your hardware is a very valid choice. Especially in the context of CI/CD.

It's relatively hard to know under what config file a certain file falls

I'm not sure if we necessarily need to know. We already include settings like phpstan.paths

"phpstan.paths": {
    "${workspaceFolder}": "/var/www/example",
},

Path mapping for scanned files. Allows for rewriting paths for for example SSH or Docker.

Aware this topic has diverged a bit, it may be better to discuss this on a more topical issue. But I think it would be a great feature to have something similar to this, but instead of specifying a path mapping for SSH or Docker, we specify which config file to use

"phpstan.configPaths": {
    "${workspaceFolder}/src/*": "${workspaceFolder}/src.phpstan.neon",
    "${workspaceFolder}/test/*": "${workspaceFolder}/test.phpstan.neon",
},

In this example, we only need to check if the file I just saved in vscode matches the glob result of ${workspaceFolder}/src/* or ${workspaceFolder}/test/* and call PHPStan with the correct config accordingly. Perhaps a warning on this setting to ensure you have different tmpDir for your cache's if they have a chance of ever having any related files if you believe most people don't configure this already.

I know this is sounding more like a feature request, so it may be better for me to initiate one if you are open to the idea?

SanderRonde commented 2 months ago

This is indeed how mine is set up. So in theory there should be no crosstalk between cache's (and I have never personally experienced such, my cache is never invalidated and I have yet to have a false report due to this cache setup)

Oh there never is crosstalk between caches since PHPStan deletes the cache when you use a different config file from last time. On one hand that's a good thing, on the other hand this makes it so that every time you switch your cache is deleted and you have to do a full uncached scan.

There is a method to the madness however, with a very large project, PHPStan eats through your memory faster than you can say segfault. So if you were limited on memory, splitting your configs so that you stay within the limit of your hardware is a very valid choice. Especially in the context of CI/CD.

Yeah that's a very valid point. PHPStan right now is very heavy to run, which the author doesn't really seem to want to acknowledge or fix, mainly saying "just use the cache, then it's fine".

Aware this topic has diverged a bit, it may be better to discuss this on a more topical issue. But I think it would be a great feature to have something similar to this, but instead of specifying a path mapping for SSH or Docker, we specify which config file to use

Hmm while I could definitely see the benefit in this, I unfortunately don't think it's worth the added complexity to this project. A couple of reasons (I like my lists):

I'm guessing the usage will be relatively small. I am open to change if this assumption turns out to be false though
There'll be a ton of edge cases ("I want to include test/** except test/foo.php, PHPStan can do it so why can't this extension")
It could lead to some hard-to-understand cases. If again both src/ and test/ include base/, a change in base/ will now either cause both to be checked if you've configured it as such (which again hammers the CPU) or will cause neither to be checked, potentially leading to stale errors.
It just makes the code in this extension a bit more complex. Given that it's already a bit of a mess and is already showing some cracks, I'd prefer to fix existing issues over implementing new potentially issue-causing features.

I'd suggest creating an issue in this repo so that others who have the same suggestion can find it and bump it. Then I'll just post the same thing as here but at least then we can gauge how many people would like the feature :)

Sectimus commented 2 months ago

Oh there never is crosstalk between caches since PHPStan deletes the cache when you use a different config file from last time. On one hand that's a good thing, on the other hand this makes it so that every time you switch your cache is deleted and you have to do a full uncached scan.

This is not necessarily true if the config files have different parameters.tmpDir defined, which they should do if you are using multiple config files within the same project that have different path requirements.

It could lead to some hard-to-understand cases. If again both src/ and test/ include base/, a change in base/ will now either cause both to be checked if you've configured it as such (which again hammers the CPU) or will cause neither to be checked, potentially leading to stale errors.

I'd say that's down to the individual dev to ensure that this can't happen, or if it does, that it is not too taxing on their machine. You would need to run both anyway (and probably will, like I do) for valid static analysis (potentially we could just queue one config after another? Doing the process slower is better than not being able to do it at all)

I've created a new issue for this separate topic: https://github.com/SanderRonde/phpstan-vscode/issues/107 so we can continue the discussion there without hammering all the lovely people in this issue.

quazardous commented 2 months ago

First big up for this extension.

I'm using phpstan from terminal too but having the warnings in VSC is very nice.

I'm just pointing out the last comment from @Sectimus because it could lead to a solution (if I'm correctly understanding the global pattern).

Queing the scan requests instead of parallel triggering to control the flow of phpstan process. For me I don't need phpstan to be realtime, cause I know that the current files I'm working on are possibly not clean. But at the end of a work session I launch all the code mess detector. Could be every 10 minutes or so if files had been touched and saved.

Could be some sort of option on save or delayed.

Sectimus commented 2 months ago

Could be some sort of option on save or delayed.

@quazardous This could still cause issues if the delayed run gets triggered after 10 minutes and you are still working to edit a file it has scanned already (but not yet fully completed the analysis).

quazardous commented 2 months ago

Could be some sort of option on save or delayed.

@quazardous This could still cause issues if the delayed run gets triggered after 10 minutes and you are still working to edit a file it has scanned already (but not yet fully completed the analysis).

Yes it could there will always be caveat. But only one scan at a time it maybe won't lead to crash freeze...

Sectimus commented 2 months ago

Yes it could there will always be caveat. But only one scan at a time it maybe won't lead to crash freeze...

A pro tip if you want to use the CLI and still get VSCode integration is to just run the same command that the extension runs

go to the output tab and click the dropdown on the right to change the context to PHPStan to see what command it runs, yours needs to be identical

Make sure you have a cache defined in your config. Now when the extension runs, it should give you results in your IDE and complete instantly.

quazardous commented 2 months ago

@Sectimus I was using a dockerized phpstan before official docker support so terminal and vscode use the same config/stuff. I could use inotify or some sort of daemon but if the extension already listen to update events why bother 😁 I'm waiting I'm sure it will work fine in the end. The complete freeze is not happening anymore it's great 👍. And I have a kill all script in case of emergency...

SanderRonde commented 2 months ago

Hmm I think running the checker in the background might be a bit too complex to implement and understand. However I do think I've got a fairly decent solution for you. If you simply set the max number of processes to 1 and limit its resources in the phpstan config, you can then disable automatic checking on file change in the extension settings (phpstan.enabled -> false) and use the PHPStan: Scan project for errors command to queue a scan. This command might take a long time (or not, depending on cache) but it'll:

Take up few resources due to the 1-process limit
Show you its progress
Show you errors once it completes

Queueing checks instead of running them in parallel isn't necessarily something the extension can do, but you have quite a bit of control over this through the parallel section of the PHPStan config file. If a full check is constantly leading to system freezes you can limit the resources you assign to PHPStan through the config file, then checks will take longer but not cause a freeze (which you mentioned you'd rather have).

SanderRonde / phpstan-vscode

Multiple PHPStan runs happening in parallel #80