Aider is creating high CPU load when dealing with large LLM responses and long patches.

azazar commented 2 months ago

Aider is creating high CPU load when dealing with large LLM responses and long patches. And it takes very long time to process a response from the LLM.

$ aider --version
aider 0.45.1

$ py-spy top --pid ***

Collecting samples from '/home/****/.local/pipx/venvs/aider-chat/bin/python /home/****/.local/bin/aider --model=openrouter/anthropic/claude-3.5-sonnet' (python v3.11.2)
Total Samples 9300
GIL: 96.00%, Active: 97.00%, Threads: 2

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 11.00%  11.00%   14.90s    15.09s   get_tokens_unprocessed (pygments/lexer.py)
  3.00%   6.00%    7.37s    10.97s   divide (rich/text.py)
  5.00%  27.00%    5.17s    15.40s   render (rich/text.py)
  4.00%  61.00%    4.19s    43.23s   <genexpr> (rich/segment.py)
 17.00%  19.00%    3.88s     7.28s   get_current_style (rich/text.py)
  3.00%   3.00%    3.83s     3.83s   <lambda> (<string>)
  5.00%   6.00%    3.23s     4.85s   <genexpr> (rich/text.py)
  3.00%   8.00%    3.03s     5.09s   __add__ (rich/style.py)
  3.00%  68.00%    2.54s    49.01s   split_and_crop_lines (rich/segment.py)
  7.00%   7.00%    2.53s     2.53s   cell_len (rich/cells.py)
  0.00%  89.00%    1.86s    80.34s   render (rich/console.py)
  3.00%  42.00%    1.69s    29.45s   __rich_console__ (rich/text.py)
  2.00%   7.00%    1.58s     3.39s   cell_length (rich/segment.py)
  0.00%   8.00%    1.46s     5.98s   adjust_line_length (rich/segment.py)
  0.00%   0.00%    1.40s     1.40s   strip_control_codes (rich/control.py)
  3.00%  66.00%    1.37s    48.69s   render_lines (rich/console.py)
  2.00%   2.00%    1.33s     1.33s   words (rich/_wrap.py)
  3.00%   3.00%    1.31s     1.68s   __eq__ (rich/style.py)
  1.00%   1.00%    1.26s     1.26s   __hash__ (rich/style.py)
  0.00%   0.00%    1.16s     2.55s   __init__ (rich/text.py)
  0.00%   0.00%    1.08s     1.09s   rich_cast (rich/protocol.py)
  0.00%  12.00%    1.06s    18.31s   append_tokens (rich/text.py)
  0.00%   3.00%    1.05s     1.84s   justify (rich/containers.py)
  1.00%  11.00%    1.01s     9.11s   wrap (rich/text.py)
  0.00%   1.00%    1.01s     3.03s   join (rich/text.py)
  0.00%   0.00%   0.920s    0.980s   <dictcomp> (rich/text.py)
  0.00%  75.00%   0.900s    68.88s   _get_syntax (rich/syntax.py)
  0.00%   0.00%   0.890s    0.930s   __init__ (markdown_it/rules_block/state_block.py)
  1.00%   1.00%   0.760s     1.91s   copy (rich/text.py)
  2.00%   5.00%   0.720s     2.56s   divide_line (rich/_wrap.py)
  1.00%   1.00%   0.630s    0.630s   plain (rich/text.py)
  2.00%   2.00%   0.550s    0.550s   copy (rich/console.py)
  0.00%   1.00%   0.530s     1.07s   _render_buffer (rich/console.py)
  2.00%   3.00%   0.470s    0.770s   truncate (rich/text.py)
  0.00%   0.00%   0.410s    0.540s   get_style_for_token (rich/syntax.py)
  1.00%   1.00%   0.410s    0.420s   render (rich/style.py)
  0.00%   1.00%   0.370s     1.44s   combine (rich/style.py)
  1.00%  76.00%   0.370s    69.25s   __init__ (rich/segment.py)

Press Control-C to quit, or ? for help.

$ top -b -n 1 | head
top - 09:49:39 up  1:29,  1 user,  load average: 1.79, 1.46, 0.95
Tasks: 399 total,   2 running, 397 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.0 us, 10.0 sy,  0.0 ni, 70.0 id, 10.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  23969.3 total,  11248.1 free,   7684.2 used,   5555.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  16285.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  10562 *         20   0 1824320 245544  42316 R 100.0   1.0   9:54.84 aider
  19716 *         20   0  577600  77340  53324 S  12.5   0.3   0:10.57 alacrit+
   3786 *         20   0  598408 105764  63440 S   6.2   0.4   1:40.35 xfwm4

nevercast commented 2 months ago

I've found this too, but I think it's related to making the chat pretty when streaming. Turning off streaming or pretty seems to return performance.

Have you tried either of those?

Edit: ah just saw the stack trace, yep, seems to be a similar issue to what I experience.

azazar commented 2 months ago

It worked. Turning off streaming and pretty output helps.

paul-gauthier commented 2 months ago

I'm going to close this issue for now, but feel free to add a comment here and I will re-open or file a new issue any time.

azazar commented 2 months ago

I actually like streaming and pretty output.

smhanov commented 4 days ago

I would like to request you reopen the item because it isn't fixed. Long responses that refactor several files take longer and longer as you go due to the O(N^2) algorithm for formatting the markdown. I'm seeing 1 token per second from claude sonnet.

smhanov commented 4 days ago

The main problem with using the --no-pretty workaround is that you lose the ability to use previous commands by pressing the up-arrow. It just outputs some escape commands instead, at least on Ubuntu.

architect> ^[[A^[[A

Aider-AI / aider

Aider is creating high CPU load when dealing with large LLM responses and long patches. #930