Unable to decode file on Windows 10

fragtastic / cis-benchmark-converter

Converts text dumps from CIS Benchmark PDFs to CSV & Excel formats.

55 stars 23 forks source link

Unable to decode file on Windows 10 #1

Closed tomrwaller closed 3 years ago

tomrwaller commented 4 years ago

When I try to run this in Windows 10 against a PDF benchmark saved to .txt I get the following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 3785: character maps to <undefined>

I've tried saving the text file with UTF8 encoding but the error still occurs.

fragtastic commented 4 years ago

@tomrwaller Give it a try with the latest commit. I've been using this on MacOS and Linux without issue so far. I forgot about this project since last year. There's additional scripts which output to Excel XLSX directly now too.

If there's still an issue if you could provide some more details I'd be appreciated. OS Version, Python version, which CIS benchmark PDF you're using, etc.

tomrwaller commented 4 years ago

Thanks for getting back to me on this. I have tried with the latest commit but I am still getting the error below.

Parsing .\CIS_Microsoft_Windows_10_Enterprise_Release_1909_Benchmark_v1.8.1.txt
Writing to .\CIS_Microsoft_Windows_10_Enterprise_Release_1909_Benchmark_v1.8.1.txt.csv
Traceback (most recent call last):
  File ".\cisConv.py", line 134, in <module>
    parseText(args.inputFile)
  File ".\cisConv.py", line 64, in parseText
    for line in inFile:
  File "C:\Python38\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1689: character maps to <undefined>

benderase commented 4 years ago

I'm having the same issue. I'm Using Macos Catalina- Ver. 10.15.5

/usr/bin/env python3 -V Python 3.6.1

CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v3.0.0.pdf

Error I'm getting: Parsing CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v3.0.0.pdf Writing to CIS_Red_Hat_Enterprise_Linux_7_Benchmark_v3.0.0.pdf.csv Traceback (most recent call last): File "./cisConv.py", line 134, in parseText(args.inputFile) File "./cisConv.py", line 64, in parseText for line in inFile: File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 11: invalid start byte

fragtastic commented 4 years ago

@benderase This doesn't take a PDF input directly. You need to dump the text from it first.

fragtastic commented 3 years ago

Closing stale issue. If persists with the latest version reopen.