Aider-AI / aider

aider is AI pair programming in your terminal
Apache License 2.0
20.59k stars 1.9k forks source link

Aider is creating high CPU load when dealing with large LLM responses and long patches. #930

Open azazar opened 2 months ago

azazar commented 2 months ago

Aider is creating high CPU load when dealing with large LLM responses and long patches. And it takes very long time to process a response from the LLM.

$ aider --version
aider 0.45.1
$ py-spy top --pid ***

Collecting samples from '/home/****/.local/pipx/venvs/aider-chat/bin/python /home/****/.local/bin/aider --model=openrouter/anthropic/claude-3.5-sonnet' (python v3.11.2)
Total Samples 9300
GIL: 96.00%, Active: 97.00%, Threads: 2

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 11.00%  11.00%   14.90s    15.09s   get_tokens_unprocessed (pygments/
  3.00%   6.00%    7.37s    10.97s   divide (rich/
  5.00%  27.00%    5.17s    15.40s   render (rich/
  4.00%  61.00%    4.19s    43.23s   <genexpr> (rich/
 17.00%  19.00%    3.88s     7.28s   get_current_style (rich/
  3.00%   3.00%    3.83s     3.83s   <lambda> (<string>)
  5.00%   6.00%    3.23s     4.85s   <genexpr> (rich/
  3.00%   8.00%    3.03s     5.09s   __add__ (rich/
  3.00%  68.00%    2.54s    49.01s   split_and_crop_lines (rich/
  7.00%   7.00%    2.53s     2.53s   cell_len (rich/
  0.00%  89.00%    1.86s    80.34s   render (rich/
  3.00%  42.00%    1.69s    29.45s   __rich_console__ (rich/
  2.00%   7.00%    1.58s     3.39s   cell_length (rich/
  0.00%   8.00%    1.46s     5.98s   adjust_line_length (rich/
  0.00%   0.00%    1.40s     1.40s   strip_control_codes (rich/
  3.00%  66.00%    1.37s    48.69s   render_lines (rich/
  2.00%   2.00%    1.33s     1.33s   words (rich/
  3.00%   3.00%    1.31s     1.68s   __eq__ (rich/
  1.00%   1.00%    1.26s     1.26s   __hash__ (rich/
  0.00%   0.00%    1.16s     2.55s   __init__ (rich/
  0.00%   0.00%    1.08s     1.09s   rich_cast (rich/
  0.00%  12.00%    1.06s    18.31s   append_tokens (rich/
  0.00%   3.00%    1.05s     1.84s   justify (rich/
  1.00%  11.00%    1.01s     9.11s   wrap (rich/
  0.00%   1.00%    1.01s     3.03s   join (rich/
  0.00%   0.00%   0.920s    0.980s   <dictcomp> (rich/
  0.00%  75.00%   0.900s    68.88s   _get_syntax (rich/
  0.00%   0.00%   0.890s    0.930s   __init__ (markdown_it/rules_block/
  1.00%   1.00%   0.760s     1.91s   copy (rich/
  2.00%   5.00%   0.720s     2.56s   divide_line (rich/
  1.00%   1.00%   0.630s    0.630s   plain (rich/
  2.00%   2.00%   0.550s    0.550s   copy (rich/
  0.00%   1.00%   0.530s     1.07s   _render_buffer (rich/
  2.00%   3.00%   0.470s    0.770s   truncate (rich/
  0.00%   0.00%   0.410s    0.540s   get_style_for_token (rich/
  1.00%   1.00%   0.410s    0.420s   render (rich/
  0.00%   1.00%   0.370s     1.44s   combine (rich/
  1.00%  76.00%   0.370s    69.25s   __init__ (rich/

Press Control-C to quit, or ? for help.
$ top -b -n 1 | head
top - 09:49:39 up  1:29,  1 user,  load average: 1.79, 1.46, 0.95
Tasks: 399 total,   2 running, 397 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.0 us, 10.0 sy,  0.0 ni, 70.0 id, 10.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  23969.3 total,  11248.1 free,   7684.2 used,   5555.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  16285.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  10562 *         20   0 1824320 245544  42316 R 100.0   1.0   9:54.84 aider
  19716 *         20   0  577600  77340  53324 S  12.5   0.3   0:10.57 alacrit+
   3786 *         20   0  598408 105764  63440 S   6.2   0.4   1:40.35 xfwm4
nevercast commented 2 months ago

I've found this too, but I think it's related to making the chat pretty when streaming. Turning off streaming or pretty seems to return performance.

Have you tried either of those?

Edit: ah just saw the stack trace, yep, seems to be a similar issue to what I experience.

azazar commented 2 months ago

It worked. Turning off streaming and pretty output helps.

paul-gauthier commented 2 months ago

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.

azazar commented 2 months ago

I actually like streaming and pretty output.

smhanov commented 4 days ago

I would like to request you reopen the item because it isn't fixed. Long responses that refactor several files take longer and longer as you go due to the O(N^2) algorithm for formatting the markdown. I'm seeing 1 token per second from claude sonnet.

smhanov commented 4 days ago

The main problem with using the --no-pretty workaround is that you lose the ability to use previous commands by pressing the up-arrow. It just outputs some escape commands instead, at least on Ubuntu.

architect> ^[[A^[[A