bndr / pipreqs

pipreqs - Generate pip requirements.txt file based on imports of any project. Looking for maintainers to move this project forward.
Apache License 2.0
6.38k stars 388 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte #214

Closed tbrodbeck closed 3 years ago

tbrodbeck commented 4 years ago

I just installed pipreqs for the first time. This is what it does:

$ pipreqs $(pwd)
Traceback (most recent call last):
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/bin/pipreqs", line 10, in <module>
    sys.exit(main())
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/lib/python3.8/site-packages/pipreqs/pipreqs.py", line 470, in main
    init(args)
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/lib/python3.8/site-packages/pipreqs/pipreqs.py", line 406, in init
    candidates = get_all_imports(input_path,
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/lib/python3.8/site-packages/pipreqs/pipreqs.py", line 122, in get_all_imports
    contents = f.read()
  File "/usr/local/bin/../Cellar/python@3.8/3.8.3_2/bin/../Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 81: invalid start byte

Am I doing something wrong?

$ neofetch           
                    'c.          tillmann@MacBook-Pro 
                 ,xNMM.          -------------------- 
               .OMMMMo           OS: macOS Catalina 10.15.5 19F101 x86_64 
               OMMM0,            Host: MacBookPro15,1 
     .;loddo:' loolloddol;.      Kernel: 19.5.0 
   cKMMMMMMMMMMNWMMMMMMMMMM0:    Uptime: 17 days, 14 hours, 5 mins 
 .KMMMMMMMMMMMMMMMMMMMMMMMWd.    Packages: 115 (brew) 
 XMMMMMMMMMMMMMMMMMMMMMMMX.      Shell: zsh 5.7.1 
;MMMMMMMMMMMMMMMMMMMMMMMM:       Resolution: 1680x1050@2x, 3840x1600@2x, 1050x1680@2x 
:MMMMMMMMMMMMMMMMMMMMMMMM:       DE: Aqua 
.MMMMMMMMMMMMMMMMMMMMMMMMX.      WM: Quartz Compositor 
 kMMMMMMMMMMMMMMMMMMMMMMMMWd.    WM Theme: Blue (Light) 
 .XMMMMMMMMMMMMMMMMMMMMMMMMMMk   Terminal: Apple_Terminal 
  .XMMMMMMMMMMMMMMMMMMMMMMMMK.   Terminal Font: HackNerdFontComplete-Regular 
    kMMMMMMMMMMMMMMMMMMMMMMd     CPU: Intel i7-8850H (12) @ 2.60GHz 
     ;KMMMMMMMWXXWMMMMMMMk.      GPU: Intel UHD Graphics 630, Radeon Pro 560X 
       .cooc,.    .,coo:.        Memory: 10080MiB / 16384MiB 
$ python -V                      
Python 3.8.3
Teraskull commented 4 years ago
$ pipreqs --encoding utf-8
tbrodbeck commented 4 years ago

It is the same:

$ pipreqs --encoding utf-8
Traceback (most recent call last):
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/bin/pipreqs", line 10, in <module>
    sys.exit(main())
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/lib/python3.8/site-packages/pipreqs/pipreqs.py", line 470, in main
    init(args)
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/lib/python3.8/site-packages/pipreqs/pipreqs.py", line 406, in init
    candidates = get_all_imports(input_path,
  File "/Users/tillmann/dev/textSum/.direnv/python-3.8.3/lib/python3.8/site-packages/pipreqs/pipreqs.py", line 122, in get_all_imports
    contents = f.read()
  File "/usr/local/bin/../Cellar/python@3.8/3.8.3_2/bin/../Frameworks/Python.framework/Versions/3.8/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 81: invalid start byte
kamalmeh commented 4 years ago

Same error here $ pipreqs --encoding utf-8 --print /home/kamalmehta/developments/alphaslate Traceback (most recent call last): .... (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 81: invalid start byte

Teraskull commented 4 years ago

@kamalmeh, try this:

$ pipreqs --encoding=iso-8859-1
dougsweetser commented 3 years ago

I had a .venv directory with the usual hugh collection of python files. Something in the package black.py messed it up. A rm -rf .venv, and "pipreqs ." worked.

phHartl commented 3 years ago

This happens when certain modules/packages got files which are not encoded with the provided encoding (e.g. UTF-8 vs. UTF-8 with BOM). In my case IPython has a file called nonascii.py which was encoded with ISO-8859-5 (& got Cyrillic characters) and joblib got a file (test_func_inspect_special_encoding.py) which was encoded with Big5 (seems to be a Chinese encoding) and also had Chinese characters inside it, as well as srsly got a file called test_ujson.py which was encoded with UTF-8 BOM despite specifying UTF-8 in its header.

I solved my issue by commenting out the first two files and saving the last one in the correct encoding. To circumvent such error pipreqs maybe should try to determine the encoding of each file dynamically be either reading the encoding specified in the header of the file or somehow determine the encoding which most likely has been used.

magictomagic commented 3 years ago
$ pipreqs --encoding utf-8

It works, but why not set utf-8 as default in source code, so we do not have to add that params?

alan-barzilay commented 3 years ago

It seems to me that the issue originally reported here has been solved, the user had a file in a different encoding that they didn't know about, so I will be closing this issue. I will accept a PR if anyone has a good idea on how to reliably decide the file encoding, but until then I will just make utf-8 the default encoding used by pipreqs

jeremydiba commented 2 years ago

I had a .venv directory with the usual hugh collection of python files. Something in the package black.py messed it up. A rm -rf .venv, and "pipreqs ." worked.

Thank you sir - this finally fixed my issue

AgaMiko commented 2 years ago

I had a .venv directory with the usual hugh collection of python files. Something in the package black.py messed it up. A rm -rf .venv, and "pipreqs ." worked.

Thank you sir - this finally fixed my issue

Adding ignore parameters also work fine and you don't have to remove virtual env!

pipreqs . --ignore ".env"

scotgopal commented 2 years ago

I had a .venv directory with the usual hugh collection of python files. Something in the package black.py messed it up. A rm -rf .venv, and "pipreqs ." worked.

Thank you sir - this finally fixed my issue

Adding ignore parameters also work fine and you don't have to remove virtual env!

pipreqs . --ignore ".env"

Thank you. This worked for me. There was no need to specify the encoding using the --encoding flag.

❯ python3 --version
Python 3.8.10
❯ pipreqs --version
0.4.11
efrenmo commented 1 year ago

Hi,

I'm having a similar problem . How can i fix it? Thank you

(VSCodeENV2) C:\Users\emora\Desktop\VSCodeENV2>pipreqs --encoding utf-8 Traceback (most recent call last): File "C:\Users\emora\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\emora\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\emora\Desktop\VSCodeENV2\VSCodeENV2\Scripts\pipreqs.exe__main__.py", line 7, in File "C:\Users\emora\Desktop\VSCodeENV2\VSCodeENV2\lib\site-packages\pipreqs\pipreqs.py", line 488, in main init(args) File "C:\Users\emora\Desktop\VSCodeENV2\VSCodeENV2\lib\site-packages\pipreqs\pipreqs.py", line 415, in init candidates = get_all_imports(input_path, File "C:\Users\emora\Desktop\VSCodeENV2\VSCodeENV2\lib\site-packages\pipreqs\pipreqs.py", line 115, in get_all_imports contents = f.read() File "C:\Users\emora\AppData\Local\Programs\Python\Python310\lib\codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 81: invalid start byte

ankurk017 commented 1 year ago

I had the similar problem. The way I how I fixed is by deleting all the temporary python files in the folder. rm -f .*py *.swp

This deletes all the temporary python scripts. pipreqs --encoding utf-8 path_to_folder/

beybars1 commented 1 year ago

Hello everyone!

As for me the issue is still present for some of you. Here is my "solution":

As @phHartl mentioned, you can skip conflicting files (nonascii.py and test_func_inspect_special_encoding.py in my case) by doing this:

In the python file of your venv (lib64/python3.9/site-packages/pipreqs/pipreqs.py), make your code similar to this snippet (on time of writting this, line is 114)

with open(file_name, "r", encoding=encoding) as f:
     print(file_name)
     if file_name in ['...../nonascii.py', ...../'test_func_inspect_special_encoding.py']:
          continue
     contents = f.read() 

I agree it is not an elegant solution, but it works!

Hope, you find it useful.

mheyman commented 1 year ago

Along the lines of beybars1 did but with less round-tripping:

replace contents = f.read() around line 115 in pipreqs.py with

try:
    contents = f.read()
except UnicodeDecodeError as ude:
    print(f'UnicodeDecodeError reading {file_name} (couldn\'t read contents). Skipping.')
mbkimani commented 1 year ago

Hey.

I had a .venv directory with the usual hugh collection of python files. Something in the package black.py messed it up. A rm -rf .venv, and "pipreqs ." worked.

Thank you sir - this finally fixed my issue

Adding ignore parameters also work fine and you don't have to remove virtual env!

pipreqs . --ignore ".env"

Thanks, this worked for me too. Just to add some context, this applies if you've labeled your virtual env and "env". If its named "venv" then replace the ".env" with ".venv"

conkeur commented 1 year ago

@kamalmeh, try this:

$ pipreqs --encoding=iso-8859-1

saved the day

csakaszamok commented 1 year ago

unfortunately this error still exists

pipreqs UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 301: character maps to

pipreqs --encoding=utf8 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 224: invalid start byte

pipreqs --encoding=iso-8859-1 SyntaxError: invalid character '¼' (U+00BC)

python -V Python 3.10.13

pip -V pip 23.2.1

pip packages: only fastai+uvicorn

hemjeet commented 11 months ago

This issue still exists

FlorianWilhelm commented 10 months ago

For me too.

mervess commented 9 months ago

As it was the whole environment folder that was causing the problem, what worked for me is:

pipreqs . --ignore <path_to_env_folder>
ymyke commented 9 months ago

I suggest to pin this issue. I think many new user stumble upon something like this.

mat926 commented 9 months ago

Still having the same issue

VaishnaviBadade commented 9 months ago

pipreqs . --ignore --force
it's working @mervess

harshjadhav890 commented 8 months ago

No matter what I couldnt get this issue resolved. I have multiple files that give me decoding errors while using pipreqs. So using --ignore for each of them is not a great way to solve it.

So I switched back to using the command: pip freeze > requirements.txt (At least for the projects containing a lot of special characters in the code)

For normal projects.. this seems to work fine: pipreqs --encoding=utf8 --debug "<>" --force --ignore <> eg. pipreqs --encoding=utf8 --debug "E:\vscode\TMLC\Higgs_Boson" --force --ignore higgsenv/

Run the above code from your base environment.

chuyaguo2014 commented 7 months ago

For me, the specific file that this failed on is ./lib/python3.9/site-packages/IPython/core/tests/nonascii.py so I ended up just ignoring the whole test directory:

pipreqs . --ignore ./lib/python3.9/site-packages/IPython/core/tests/ --force
tobszarny commented 6 months ago

As it was the whole environment folder that was causing the problem, what worked for me is:

pipreqs . --ignore <path_to_env_folder>

Thanks, you're a lifesaver. This helps.

It does not feel right to have to undergo instrumentation of pipreqs to ignore env folder

angpetrov commented 5 months ago

For me, the specific file that this failed on is ./lib/python3.9/site-packages/IPython/core/tests/nonascii.py so I ended up just ignoring the whole test directory:

pipreqs . --ignore ./lib/python3.9/site-packages/IPython/core/tests/ --force

Likewise, I found out that file by adding one more line into the pipreqs.py

read_file_contents:

def read_file_content(file_name: str, encoding="utf-8"):
    print(f"Processing file: {file_name}")  # Add this line to show the file being processed
    if file_ext_is_allowed(file_name, DEFAULT_EXTENSIONS):
        with open(file_name, "r", encoding=encoding) as f:
            contents = f.read()
    elif file_ext_is_allowed(file_name, [".ipynb"]) and scan_noteboooks:
        contents = ipynb_2_py(file_name, encoding=encoding)
    return contents

I imagine this problem happens when pipreqs is scanning the virtual environment folder too.

bibaodi commented 5 months ago

I found a solution:

  1. Err msg

    File "/mnt/d/.venv/lib/python3.10/site-packages/pipreqs/pipreqs.py", line 136, in get_all_imports
    contents = read_file_content(file_name, encoding)
    File "/mnt/d/.venv/lib/python3.10/site-packages/pipreqs/pipreqs.py", line 181, in read_file_content
    contents = f.read()
    File "/usr/lib/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 81: invalid start byte
  2. add debug info in pipreqs.py line 136 print(f"debug: file name={file_name}")

  3. then find the error file.

  4. ignore dir pipreqs ./ --ignore ./.venv/lib/python3.10/site-packages/IPython/ --print

  5. done

rafey-husain-sw commented 5 months ago

pipreqs . --encoding=iso-8859-1 --ignore ".venv"

RCarteri commented 3 months ago

I used pipreqs . --ignore ".venv" that worked

sergioguijarroc commented 3 months ago

This command works for me pipreqs ./ --ignore ./.venv/lib/python3.10/site-packages/IPython/

JardelCunha commented 3 months ago

Hey.

I had a .venv directory with the usual hugh collection of python files. Something in the package black.py messed it up. A rm -rf .venv, and "pipreqs ." worked.

Thank you sir - this finally fixed my issue

Adding ignore parameters also work fine and you don't have to remove virtual env! pipreqs . --ignore ".env"

Thanks, this worked for me too. Just to add some context, this applies if you've labeled your virtual env and "env". If its named "venv" then replace the ".env" with ".venv"

Thank you. This worked for me.

ghost commented 2 months ago

Still having this problem with a very basic python install. If I hadn't found this thread I would have just immediately stopped using pipreqs because I assumed it was broken.

If this is an expected failure case (with a pretty reasonable work-around), at the very least catch the error and show an error message so it doesn't look like the script is just broken.

aidez06 commented 1 month ago

I rechecked my working directory and noticed that I do have some images in my folder. One reason for the issue is that I have suno\toggle_button\instrumental_enable.png. When I used pipreqs, it changed the filename to instrumental_enable.py, which caused the issue. To fix this, you need to ignore the folder containing the images by using the following command:

pipreqs . --force --ignore "suno/toggle_button" image