bridgecrewio / checkov

Prevent cloud misconfigurations and find vulnerabilities during build-time in infrastructure as code, container images and open source packages with Checkov by Bridgecrew.
https://www.checkov.io/
Apache License 2.0
6.71k stars 1.08k forks source link

Why is a Hebrew Point Rafe character (\u05bf) included in console output when parsing errors are encountered? #6416

Open chrisnielsen-MS opened 3 weeks ago

chrisnielsen-MS commented 3 weeks ago

Describe the issue The console output when encountering errors parsing files includes a Hebrew Point Rafe character (\u05bf) at the end of file names.

image

Additional context I am trying to run checkov as a subprocess of another application and I keep getting encoding problems that lead to an unhandled exception whenever the console attempts to write these errors. When I run checkov normally (not in my subprocess) it works and prints the lines provided in my image above. I have been trying to override the encoding to utf-8, but I thought it worth asking if this output is even intended. This is the exact error message I receive:

[Error] 2024-06-07 16:54:17,443 [MainThread ] [ERROR] Exception traceback: [Error] Traceback (most recent call last): [Error] File "checkov\main.py", line 557, in run [Error] exit_codes.append(self.print_results( [Error] File "checkov\main.py", line 823, in print_results [Error] return runner_registry.print_reports( [Error] File "checkov\common\runners\runner_registry.py", line 501, in print_reports [Error] print(report.print_console( [Error] File "colorama\ansitowin32.py", line 47, in write [Error] File "colorama\ansitowin32.py", line 177, in write [Error] File "colorama\ansitowin32.py", line 205, in write_and_convert [Error] File "colorama\ansitowin32.py", line 210, in write_plain_text [Error] File "colorama\ansitowin32.py", line 47, in write [Error] File "colorama\ansitowin32.py", line 177, in write [Error] File "colorama\ansitowin32.py", line 205, in write_and_convert [Error] File "colorama\ansitowin32.py", line 210, in write_plain_text [Error] File "encodings\cp1252.py", line 19, in encode [Error] UnicodeEncodeError: 'charmap' codec can't encode character '\u05bf' in position 162: character maps to [Error] Traceback (most recent call last): [Error] File "checkov\main.py", line 837, in [Error] File "checkov\main.py", line 557, in run [Error] File "checkov\main.py", line 823, in print_results [Error] File "checkov\common\runners\runner_registry.py", line 501, in print_reports [Error] File "colorama\ansitowin32.py", line 47, in write [Error] File "colorama\ansitowin32.py", line 177, in write [Error] File "colorama\ansitowin32.py", line 205, in write_and_convert [Error] File "colorama\ansitowin32.py", line 210, in write_plain_text [Error] File "colorama\ansitowin32.py", line 47, in write [Error] File "colorama\ansitowin32.py", line 177, in write [Error] File "colorama\ansitowin32.py", line 205, in write_and_convert [Error] File "colorama\ansitowin32.py", line 210, in write_plain_text [Error] File "encodings\cp1252.py", line 19, in encode [Error] UnicodeEncodeError: 'charmap' codec can't encode character '\u05bf' in position 162: character maps to [Error] [66316] Failed to execute script 'main' due to unhandled exception!

chrisnielsen-MS commented 2 weeks ago

Adding some context I've learned since opening this: the issue seems to be Windows-specific and is related to this issue: https://github.com/python/cpython/issues/45943

This is a console code pages issue, and I found a workaround in Windows Region settings by enabling "Use Unicode UTF-8 for worldwide language support".

This still doesn't explain why that character is included at the end of the file name in the checkov output, though. These particular parsing failures I encountered turned out to be caused by invalid JSON that included comments in it -- I have not tested to see if other types of parsing failures cause the same output.

chrisnielsen-MS commented 1 week ago

I finally resolved this issue for myself. It turns out this is a pyinstaller issue that happens whenever you attempt to redirect the output on Windows and encounter an unusual character such as the one emitted in these parsing failure messages. The fix was to add a simple runtime hook like this to the checkov.spec file and regenerate the .exe that way:

import os
import sys

os.environ["PYTHONIOENCODING"] = 'utf-8'
sys.stdout.reconfigure(encoding='utf-8')
sys.stderr.reconfigure(encoding='utf-8')