Closed ares0027 closed 2 months ago
Can I see your config and (if you can share it) the input file?
One solution might just be to track down that character and delete it, maybe
Hi,
I found a solution for the UnicodeDecodeError
problem. The fix is to change line 1430 in the file augmentoolkit/control_flow_functions/control_flow_functions.py
to use UTF-8 encoding when opening the file.
The original line is: with open(file_path, "r") as file: file_contents = file.read()
It should be changed to: with open(file_path, "r", encoding="utf-8") as file: file_contents = file.read()
I made this change and it fixed the problem for me. I hope this helps others who are having the same issue.
@juanjopc Awesome that you were able to fix the issue! Would you mind making a PR to add this? If not, I can go ahead and do that, it seems simple enough. Thank you!
Merged! Thanks for the contribution!
I tried creating new text files, i tried converting to utf, ansi etc with Notepad++ including Windows 1252 and 1254...
Traceback (most recent call last): File "C:\llm-train\augmentoolkit\processing.py", line 417, in <module> asyncio.run(main()) File "C:\Users\baran\AppData\Local\Programs\Python\Python310\lib\asyncio\runners.py", line 44, in run return loop.run_until_complete(main) File "C:\Users\baran\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete return future.result() File "C:\llm-train\augmentoolkit\processing.py", line 73, in main control_flow_functions.create_pretraining_set( File "C:\llm-train\augmentoolkit\augmentoolkit\control_flow_functions\control_flow_functions.py", line 1430, in create_pretraining_set file_contents = file.read() File "C:\Users\baran\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 102038: character maps to <undefined>