PyCQA / isort

A Python utility / library to sort imports.
https://pycqa.github.io/isort/
MIT License
6.44k stars 574 forks source link

isortd server process #1734

Open wyuenho opened 3 years ago

wyuenho commented 3 years ago

Starting the python process takes forever, but once it's started, formatting is acceptably fast. I was just wondering if there is any plan to incorporate or write a daemon server ala blackd or eslint_d or prettier_d? A gentleman seems to have already made a start, it would be nice if the two projects could merge too.

https://github.com/urm8/isortd/issues/2

timothycrosley commented 3 years ago

I would love to see this added to isort! @urm8 let me know if you have any interest as well, I can help with quick reviews into the repository, or in any other way you would find helpful.

@wyuenho thanks for highlighting this oppurtunity!

urm8 commented 3 years ago

Hi @timothycrosley , that would be great to see isortd as a part of isort project ! But there is one thing I don't like in current isortd implementation(as in blackd impl also) that is very limited in terms of tools devs may use it with. For now that thing is available only for PyCharm users through IsortConnect plugin. Its just opens http api to "isort", and I think, that it would be better to implement some kind of filewatcher service for isort, that will watch for file changes within directory and run isort on em (even brought some pr for that: https://github.com/urm8/isortd/pull/6) than rely on some third party clients to do that. I'd like to hear your opinion on how that better be implemented.

timothycrosley commented 3 years ago

@urm8 a file watcher sounds cool! Another option I could see being interesting, is to write an optimized minimal Python entrypoint, that if it detects a running isortd process uses that, and otherwise reroutes to the existing CLI. As starting python itself is generally fast enough (14ms on my laptop, with a quick test run of time python "exit()". If that could reasonably be pulled off, it would mean most existing tooling (which utilize isort's CLI, for example VSCode) would automatically become much faster if isortd was running and configured. Curious if you have thoughts on this approach?

disrupted commented 3 years ago

@urm8 a file watcher sounds cool! Another option I could see being interesting, is to write an optimized minimal Python entrypoint, that if it detects a running isortd process uses that, and otherwise reroutes to the existing CLI. As starting python itself is generally fast enough (14ms on my laptop, with a quick test run of time python "exit()". If that could reasonably be pulled off, it would mean most existing tooling (which utilize isort's CLI, for example VSCode) would automatically become much faster if isortd was running and configured. Curious if you have thoughts on this approach?

To me this makes much more sense than a file-watcher as I imagine most people invoke isort in their text editor and/or pre-commit hooks.

wyuenho commented 3 years ago

The command to test python start up time is actually time python -c "exit()". It took about 0.2s on my MBP 2014.

The issue a daemon server tries to solve is not just starting the python interpreter, but to also "warm it up", as in, read, parse and execute all the relevant imports, so when input comes in, the program goes straight to formatting.

A really dumb but effective isortd implementation doesn't even need to use aiohttp or any fancy IPC, all it has to do is to make a worker process pool of isort waiting for stdin. When the daemon receives from stdin, it can just send it to a process in the pool and return its stdout. The isort worker process will exit, but the pool will have another one available to accept "requests". The daemon can simply start another worker process once one has exited.

urm8 commented 3 years ago

@urm8 a file watcher sounds cool! Another option I could see being interesting, is to write an optimized minimal Python entrypoint, that if it detects a running isortd process uses that, and otherwise reroutes to the existing CLI. As starting python itself is generally fast enough (14ms on my laptop, with a quick test run of time python "exit()". If that could reasonably be pulled off, it would mean most existing tooling (which utilize isort's CLI, for example VSCode) would automatically become much faster if isortd was running and configured. Curious if you have thoughts on this approach?

The command to test python start up time is actually time python -c "exit()". It took about 0.2s on my MBP 2014.

The issue a daemon server tries to solve is not just starting the python interpreter, but to also "warm it up", as in, read, parse and execute all the relevant imports, so when input comes in, the program goes straight to formatting.

A really dumb but effective isortd implementation doesn't even need to use aiohttp or any fancy IPC, all it has to do is to make a worker process pool of isort waiting for stdin. When the daemon receives from stdin, it can just send it to a process in the pool and return its stdout. The isort worker process will exit, but the pool will have another one available to accept "requests". The daemon can simply start another worker process once one has exited.

So the plan is: 1) parse config, launch daemon that holds stdin 2) isort entry point stats some file that indicates, that daemon is running , acquires write lock, and pushes contents to daemon and waits for output 3) daemon receives file contents, and sends it to process pool (mb simple fork suites here better ?), then returns to stdin 4) process spits sorted file to stdout 5) isort takes it, releases the lock and continues execution as usual Am I correct ?

wyuenho commented 3 years ago

Something like that, whether the worker process will exit after formatting and requires starting a new one to replenish the pool is an implementation detail that depends on how lazy you want to be lol. A process pool of 2 goes a long way.

timothycrosley commented 3 years ago

I'm starting to wonder if the basis of this issue is fully accurate:

Starting the python process takes forever, but once it's started, formatting is acceptably fast.

tim@pop-os:~/Projects/isort$ time isort . --show-config
real    0m0.111s
user    0m0.095s
sys     0m0.016s

That includes all the setup time for isort (loading configs etc) that could be saved by a daemon. Sure some projects have larger configs, but I'm kinda curious is one tenth of a second really slow? What is an acceptable delay? Is it longer for others? Mostly, I ask, because if this is a goal it will be important to quantify what fast enough is, as that will greatly impact the design of any solution. I also ask, because it's possible, that there is something wrong with individual config options, or similar, causing some users to experience a long delay that is not widely experienced - and if this is the case it may make sense for isort to fix those performance issues separately regardless.

wyuenho commented 3 years ago

Movies play at 24 fps and if you use an old HDMI cable that only supports 30fps to connect your laptop to the screen, the mouse pointer is perceptibly laggy as you move across the screen. Most things on the computer needs at least 60fps to feel smooth. On the same 2014 MBP 15" i7 2.5Ghz, this is the result:

~/Documents/workspace/unicode-fonts
❯ time python -c "exit()"

real    0m0.208s
user    0m0.069s
sys 0m0.090s

~/Documents/workspace/unicode-fonts
❯ time isort ./find_fonts 

real    0m0.260s
user    0m0.203s
sys 0m0.049s

The file has 1463 lines.

60 fps requires isort to finish in about 0.01s and leave 0.004s for the editor to make the request and respond to the response. My isort config only has profile="black" in it. The result above shows 5 frames is required just to rearrange some imports, but starting the interpreter and processing the config effectively lowered formatting opts to only around 4 per second. That's 4 fps. I imagine just formatting can be cut down to perhaps 2-3 frames on newer machines, which is acceptable.

wyuenho commented 3 years ago

As an aside, on my MBP 2020 13" 2GHz machine, there's a file where there are only 204 lines and 9 imports in it. Using the default configuration took 0.4s to format. Yes isort is slow, and there are possibly more than a few ways to improve perf, this is one of them.

timothycrosley commented 3 years ago

I personally, don't think FPS is a reasonable, or even beneficial way to think about time for code formatters. The goal shouldn't be to fool a brain into seeing motion after all... Half a second, honestly, in my opinion is still fast. A very fast human response for any stimuli is around 200ms, anything beyond that and I'm not really sure it's really perceptible unless you are intentionally looking. To me the likely bigger issue is when you combine multiple formatters the fractions of a seconds here and there can add up to multiple seconds, and you actually have a real sense of delay. For that I think a language server is likely the best approach.

timothycrosley commented 3 years ago

Doing some more research, I think in general for actions a user takes, the general consensus is

See: https://www.nngroup.com/articles/response-times-3-important-limits/

I would like isort to be instant of course, but am okay with it living somewhere between instant and fast.

wyuenho commented 3 years ago

I personally, don't think FPS is a reasonable, or even beneficial way to think about time for code formatters. The goal shouldn't be to fool a brain into seeing motion after all

I beg to differ. Text editors are real time applications, from the moment the user issued the key combo to format source code, to the moment the formatted is seen formatted, the user is blocked from further changes to the source code. 0.4s by default just to rearrange 9 imports on a 2020 machine is unacceptable. The train of thought might not be lost, but the work flow is definitely interrupted.

A very fast human response for any stimuli is around 200ms, anything beyond that and I'm not really sure it's really perceptible unless you are intentionally looking.

Humans can see images in as little as 13ms, that's why 60 fps is the target every video game is trying to hit. Seeing your source code reformatted is equivalent to seeing a new image, seeing it changed is thus perceived as motion, which is required for the user to respond to. That's where the 200ms limit comes in. The time includes receiving stimuli, processing and physically responding. We are talking about 200ms to 500ms for laptops produced in the last 7 years just for isort to produce a stimulus.

For that I think a language server is likely the best approach.

All python languages servers are incredibly slow. Pylance is probably the fastest, but it's not available on editors other than VSCode, and it does not have any sort of server to format source code, it just calls the formatter CLI in a shell like emacs and vim do.

disrupted commented 3 years ago

I'm starting to wonder if the basis of this issue is fully accurate:

Starting the python process takes forever, but once it's started, formatting is acceptably fast.

tim@pop-os:~/Projects/isort$ time isort . --show-config
real    0m0.111s
user    0m0.095s
sys     0m0.016s

That includes all the setup time for isort (loading configs etc) that could be saved by a daemon. Sure some projects have larger configs, but I'm kinda curious is one tenth of a second really slow? What is an acceptable delay? Is it longer for others? Mostly, I ask, because if this is a goal it will be important to quantify what fast enough is, as that will greatly impact the design of any solution. I also ask, because it's possible, that there is something wrong with individual config options, or similar, causing some users to experience a long delay that is not widely experienced - and if this is the case it may make sense for isort to fix those performance issues separately regardless.

In my experience isort takes much longer to start. When I started using isortd (unofficial) I did a small comparison for myself and the difference was astonishing.

I understand this is very personal and non-scientific but the difference for me was 440ms (isort) vs. 20ms (isortd) when running it on a specific file. I also posted about it here: https://github.com/urm8/isortd/issues/2

I basically tested it like this:

hyperfine 'cat subclean/core/parser.py | isort --profile black --stdout -'
# start the daemon:
python isortd
# then in a separate shell:
hyperfine 'cat subclean/core/parser.py | curl -s -X POST "localhost:47393" -H "XX-SRC: $(pwd)" --data-binary @-'

To me this shows the huge potential and having isort run as a daemon especially when you're calling it very frequently through an editor integration (format on save) just makes sense imo.

timothycrosley commented 3 years ago

Humans can see images in as little as 13ms

@wyuenho The question isn't how long it takes for a human to know that an image ever existed, if anything it's how long it takes before someone will identify correctly that it was there for more than one frame, and then how long before someone would feel it was "stuck". Per you're own article, it takes at least 100ms for a human to even correctly identify the image. After all, these are not frame to frame smooth animations, that would require the editor to make soo many more frames to show each characters new position! This is a scene to new scene based on a user action, and for that all research that I've seen, even your linked article, is pretty consistent that anything faster than 100ms won't even be noticed.

timothycrosley commented 3 years ago

https://www.youtube.com/watch?v=vOvQCPLkPt4 is probably the best example of this difference and phenomenon. Independent of the latency, for simply touching an individual spot to perform an action it seems instant, the delay is only noticeable when you bring motion into the picture. And that's with direct touch where any delay should be much more noticeable as there is a direct visual connection between the action you are trying to perform and it taking place.The widely agreed <100ms == instant, and <1 second == fast cut-offs are from real-time application. The difference isn't real-time or not, the difference is motion vs not motion. FPS just isn't relevant based on all literature I can find when you aren't trying to trick a brain to see motion.

wyuenho commented 3 years ago

The question isn't how long it takes for a human to know that an image ever existed, if anything it's how long it takes before someone will identify correctly that it was there for more than one frame, and then how long before someone would feel it was "stuck".

The question is how long it takes for me to perceive there was motion, which can be as short as 13ms, and how long isort should take to let me resume typing.

Per you're own article, it takes at least 100ms for a human to even correctly identify the image.

You did not read the article, or the study, or understand it. The article only mention 100ms 3 times, the first 2 times tells you this study contradicts this "consensus", the last time tells you high speed understanding must occur in less than 100ms.

After all, these are not frame to frame smooth animations, that would require the editor to make soo many more frames to show each characters new position!

This is a bulk operation, the entire text is drawn into a buffer and then blitzed to the graphics card's frame buffer in one go, all the calculation is done before hand, even an ancient immediate mode editor like emacs does this now. I can't be sure of the exact frame rate on VS Code, as they use a newer version of CEF than Chrome that has the FPS counter replaced, but as of Chrome 90 on my 7-year-old machine driving a 4k screen it was never designed to, I'm still getting a consistent 55 FPS as I'm typing this comment.

This is a scene to new scene based on a user action, and for that all research that I've seen, even your linked article, is pretty consistent that anything faster than 100ms won't even be noticed.

Wrong. The study is trying to find the best neural network model that best describes experimental results. The results shows the participants were able to extract the gist of an image only flashed in front of them for 13ms at a rate significantly greater than guessing randomly, independently of whether they were ask what to identify before or after the images were shown, or whether 6 or 12 images were shown in a series. Here's a quote from the paper:

even durations as short as 13 ms are clearly sufficient, on a significant proportion of trials, to drive conscious detection, identification, and immediate recognition memory

Here's the last paragraph:

A possible role for such rapid visual understanding in normal experience would be to provide nearly instantaneous conceptual activation that enables immediate action when necessary, without waiting to refine understanding by reentrant processing or by the kind of conscious reflection that requires a stable recurrent network.

This is the study. Be sure the read the citations at the bottom if you want to know what the consensus is.

Besides, you are still missing the point. isort is no where close to being able to format text at 100ms per op except for some lucky souls running on super beefy computers. The goal shouldn't be targeting isort to finish in ~100ms, the goal should be from the moment the user issued the formatting command to the user understand the source code has been formatted and can resume typing, the entire cycle should take 100ms max. isort is monopolizing this time constraint so no other processing by the machine or the human is possible in the time slot. In this entire pipeline, isort should only be a small stage, not the only stage.

We've just shown you can easily get an 80% performance boost, which is as good as instantaneous, by implementing a server. It isn't hard to do, and an admirable goal in itself. Where exactly is this resistance coming from?

wyuenho commented 3 years ago

Please stop citing pre-2014 material...

timothycrosley commented 3 years ago

There is no resistance to setting up a server, I've already accepted such a contribution, like I accept all contributions with open arms. The resistance is to the notion that the notion that isort is very slow, and that I should target some crazy small number of FPS as the target. Especially when I feel very confident if I did a randomized trial with a difference between 99ms and 50ms no one, or at least not a large number, would be able to tell the difference. I would much rather the project target 100ms, because that opens up many more technical solutions, such as for most users, a minimal Python client calling out to the server process (which can't be a solution if it needs to be much lower as then we reach the point where Python start up time is too slow for most users on its own).

wyuenho commented 3 years ago

Nobody is asking you to implement a Python client, every editor can make HTTP requests or connect to a pipe or a unix domain socket. black does not spawn blackd to do its job. We just want a server.

Fine, isort is fast, we've already established it is super fast if we eliminate the python interpreter startup time, can we make it imperceptibly fast for all subsequent formatting ops by only starting the python interpreter once please?

timothycrosley commented 3 years ago

I get your thought process here, and isort is more popular than it was in the past so maybe the speed of downstream changes will be different, but you have to understand I'm used to such changes as taking years and being very spotty when waiting for the integration from the editor and plugin side, so I hope you can understand why from my point of view I see the Python client, or similar, as being a necessary short cut. That said, I'd accept the server process in any state included what @urm8 has implemented already very happily and thankfully.

Akuli commented 3 years ago

For a slight improvement in startup performance, you can specify -S to skip loading the site module (the exit() function won't work, because site adds it):

akuli@akuli-desktop:~$ time python3 -c "from sys import exit; exit()"

real    0m0,018s
user    0m0,017s
sys 0m0,000s
akuli@akuli-desktop:~$ time python3 -S -c "from sys import exit; exit()"

real    0m0,014s
user    0m0,005s
sys 0m0,010s

These times were measured after running the same commands earlier. To me, the first time is the slowest, presumably because the kernel caches file contents in memory.

timothycrosley commented 3 years ago

@wyuenho and @disrupted, the latest version of isort just released includes some significant startup time improvements (on my own older test machine, it went from as much as 500ms to do time isort . --show-config to as fast as 50ms. I would be curious to hear if these changes also impacts the performance both of you are seeing

disrupted commented 3 years ago

@wyuenho and @disrupted, the latest version of isort just released includes some significant startup time improvements (on my own older test machine, it went from as much as 500ms to do time isort . --show-config to as fast as 50ms. I would be curious to hear if these changes also impacts the performance both of you are seeing

will check it out shortly and report back, thanks for the heads up and your effort

disrupted commented 3 years ago

@timothycrosley Can confirm, 5.9 indeed has a much quicker startup time. 👏🏻 When formatting a file it's down for me from ~440ms to ~150ms. With that being said, isortd still beats it greatly at almost 10x the speed, so it still makes sense going that route to essentially eliminate any sort of startup delay on these very frequently used tools.

wyuenho commented 3 years ago

Confirmed. ~150ms on both of my machines.

timothycrosley commented 3 years ago

Glad to hear, while it may not have going all the way down to 50ms, it sped it up considerably for both you! I agree that a isortd type solution would still be very useful, and this is in no way in lieu of that, it was just some really low hanging fruit I was able to squeeze in to the latest release that I hope will ease some pain in the short term till a server process is fully integrated.

failable commented 1 month ago

Any update?