Aider-AI / aider

aider is AI pair programming in your terminal
https://aider.chat/
Apache License 2.0
19.99k stars 1.83k forks source link

Encoding fails when stdout is piped or redirected to file on Windows #1873

Open cspotcode opened 4 days ago

cspotcode commented 4 days ago

Issue

tl;dr infamous character encoding error happens with --encoding --no-pretty when redirecting stdout on Windows pwsh.

To save costs, I'm capturing the contents of the repo-map in various situations to understand how it's generated, how to optimize it, etc. To do that, I'm using a fairly common pattern: running a shell command and piping the output somewhere.

# Pwsh (PowerShell Core) 7.4.5
aider --show-repo-map --encoding utf-8 --no-pretty > repomap.txt
# Also happens when redirecting to `echo` which is a more convenient reproduction
aider --show-repo-map --encoding utf-8 --no-pretty | echo

Unfortunately, when stdout is redirected, print() fails to encode one of the braille characters used for aider's spinner. My prompt also never appears until I hit Ctrl+C; I suspect this because python doesn't close the redirected file descriptor. Notably:

A simpler reproduction is:

# This one has the same encoding error but *does* return terminal prompt without Ctrl+C
python -c 'print("\u280b")' | echo

Full log:

PS C:\...> aider --show-repo-map --encoding utf-8 --no-pretty | echo

Aider v0.58.2.dev7+g9789668c
Main model: claude-3-5-sonnet-20240620 with diff edit format, infinite output
Weak model: claude-3-haiku-20240307
Git repo: .git with 7,638 files
Warning: For large repos, consider using --subtree-only and .aiderignore
See: https://aider.chat/docs/faq.html#can-i-use-aider-in-a-large-mono-repo
Repo-map: using 1024 tokens, auto refresh

# Uncaught UnicodeEncodeError in cp1252.py line 19

Aider version: 0.58.2.dev7+g9789668c
Python version: 3.12.6
Platform: Windows-11-10.0.22631-SP0
Python implementation: CPython
Virtual environment: Yes
OS: Windows 11 (64bit)
Git version: git version 2.40.1.windows.1

An uncaught exception occurred:

`` `
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "__main__.py", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "main.py", line 672, in main
    repo_map = coder.get_repo_map()
               ^^^^^^^^^^^^^^^^^^^^
  File "base_coder.py", line 599, in get_repo_map
    repo_content = self.repo_map.get_repo_map(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "repomap.py", line 127, in get_repo_map
    files_listing = self.get_ranked_tags_map(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "repomap.py", line 482, in get_ranked_tags_map
    result = self.get_ranked_tags_map_uncached(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "repomap.py", line 513, in get_ranked_tags_map_uncached
    ranked_tags = self.get_ranked_tags(
                  ^^^^^^^^^^^^^^^^^^^^^
  File "repomap.py", line 316, in get_ranked_tags
    progress()
  File "utils.py", line 283, in step
    self._step()
  File "utils.py", line 292, in _step
    print(f"\r{self.text} {next(self.spinner_chars)}\r{self.text} ", end="", flush=True)
  File "cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u280b' in position 19: character maps to <undefined>

Version and model info

No response

cspotcode commented 4 days ago

I'm using env var PYTHONIOENCODING=utf-8 as a workaround.

It still prints the spinner, which comes out looking corrupted. Ideally that'd be disabled for non-terminal output. But it's not a big deal.