SpikeInterface / spikeinterface

A Python-based module for creating flexible and robust spike sorting pipelines.
https://spikeinterface.readthedocs.io
MIT License
533 stars 187 forks source link

The following coding errors will occur when the Chinese system is used #3429

Closed 522848942 closed 1 month ago

522848942 commented 2 months ago

When running the following code, if the Chinese system is used, the following error will be reported image

UnicodeDecodeError Traceback (most recent call last) Cell In[3], line 2 1 print("Available sorters", ss.available_sorters()) ----> 2 print("Installed sorters", ss.installed_sorters())

File e:\anaconda\envs\spike\lib\site-packages\spikeinterface\sorters\sorterlist.py:65, in installed_sorters() 62 def installed_sorters(): 63 """Lists installed sorters.""" ---> 65 return sorted([s.sorter_name for s in sorter_full_list if s.is_installed()])

File e:\anaconda\envs\spike\lib\site-packages\spikeinterface\sorters\sorterlist.py:65, in (.0) 62 def installed_sorters(): 63 """Lists installed sorters.""" ---> 65 return sorted([s.sorter_name for s in sorter_full_list if s.is_installed()])

File e:\anaconda\envs\spike\lib\site-packages\spikeinterface\sorters\external\hdsort.py:92, in HDSortSorter.is_installed(cls) 90 @classmethod 91 def is_installed(cls): ---> 92 if cls.check_compiled(): 93 return True 94 return check_if_installed(cls.hdsort_path)

File e:\anaconda\envs\spike\lib\site-packages\spikeinterface\sorters\basesorter.py:375, in BaseSorter.check_compiled(cls) 367 shell_cmd = f""" ... --> 322 (result, consumed) = self._buffer_decode(data, self.errors, final) 323 # keep undecoded input until the next call 324 self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 5: invalid start byte Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

I solved this problem by adding the following code to shellscript.py. Is there a better solution? add encoding='gbk' when using subprocess.Popen image

zm711 commented 2 months ago

Can you tell us a bit more about this? What is the specific Chinese system you're using (simplified, traditional)? I don't really know encodings other than 'ASCII' or "utf-8". Is this a specific way of encoding Chinese? We definitely have users that have Chinese in their code and it seems work for them so it would be helpful if we could debug this a bit more.

h-mayorquin commented 2 months ago

I think this is an https://git.bsse.ethz.ch/hima_public/HDsort problem, isn't it?

Are you using the latest version of spikeinterface? I could not find the lines where you error came from.

522848942 commented 2 months ago

I think i‘m using the latest version of spikeinterface. I follow the tutorial in the following pages to install spikeinterface. https://spikeinterface.readthedocs.io/en/latest/get_started/installation.html pip install spikeinterface[full,widgets] And when i run print("Installed sorters", ss.installed_sorters()) is problem appear. image

And this is my system: image

Here is the file which i add encoding='gbk' can solve the problem: E:\anaconda\envs\spike\Lib\site-packages\spikeinterface\sorters\utils\shellscript.py

I copy all code in shellscript.py in the following txt: shellscript.txt

And here is the place i add encoding='gbk': image

522848942 commented 2 months ago

I think this problem arises because _process.stdout here is sometimes in Chinese such as : ('#!' 不是内部或外部命令,也不是可运行的程序或批处理文件。此时不应有 [。 ) in chinese system which means : ('#!' is not recognized as an internal or external command,operable program or batch file.)in english system

and line cannot be decoded(Line 93) image

zm711 commented 2 months ago

So based on my reading it seems like Windows uses gbk encoding for simplified Chinese which is causing this problem. I guess the easiest solutions are either you use a private fork with gbk encoding or you use English for the script. Why are you using the shebang in general? Or is that from our code that is trying to use the shebang? Since utf-8 seems to work for most systems I'm not sure what the best solution is here. @alejoe91 do you have any ideas for checking necessary encoding for shell scripting with different encoding systems? Or do we want to enforce utf-8 and have people make their own forks with gbk?

alejoe91 commented 2 months ago

Could we switch automatically in case a Chinese system is detected?

zm711 commented 2 months ago

That's what I was wondering. But I don't know how to do that? I guess we could do a try-except before the shell script and try to decode something and if it fails switch to gbk. But I don't know what we could check? It isn't the OS itself, it's just the typing in terminal, so without forcing a config file I'm not sure.

jonpedros commented 2 months ago

I get a similar error in a system with a Japanese Windows installation. Running:

ss.installed_sorters()

Returns:

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[5], [line 7](vscode-notebook-cell:?execution_count=5&line=7)
      [5](vscode-notebook-cell:?execution_count=5&line=5)     analyzer_obj = si.load_sorting_analyzer(analyzer_fp, format='binary_folder')
      [6](vscode-notebook-cell:?execution_count=5&line=6) else:
----> [7](vscode-notebook-cell:?execution_count=5&line=7)     use_docker = False if sorter in ss.installed_sorters() else True
      [8](vscode-notebook-cell:?execution_count=5&line=8)     sorter_fp = os.path.join(output_path, f'{sorter}_output')
      [9](vscode-notebook-cell:?execution_count=5&line=9)     if grouped_sorting:

File c:\Users\system-ses\anaconda3\envs\sci_env\Lib\site-packages\spikeinterface\sorters\sorterlist.py:65, in installed_sorters()
     [62](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/sorterlist.py:62) def installed_sorters():
     [63](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/sorterlist.py:63)     """Lists installed sorters."""
---> [65](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/sorterlist.py:65)     return sorted([s.sorter_name for s in sorter_full_list if s.is_installed()])

File c:\Users\system-ses\anaconda3\envs\sci_env\Lib\site-packages\spikeinterface\sorters\sorterlist.py:65, in <listcomp>(.0)
     [62](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/sorterlist.py:62) def installed_sorters():
     [63](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/sorterlist.py:63)     """Lists installed sorters."""
---> [65](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/sorterlist.py:65)     return sorted([s.sorter_name for s in sorter_full_list if s.is_installed()])

File c:\Users\system-ses\anaconda3\envs\sci_env\Lib\site-packages\spikeinterface\sorters\external\hdsort.py:92, in HDSortSorter.is_installed(cls)
     [90](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/external/hdsort.py:90) @classmethod
     [91](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/external/hdsort.py:91) def is_installed(cls):
---> [92](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/external/hdsort.py:92)     if cls.check_compiled():
     [93](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/external/hdsort.py:93)         return True
     [94](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/external/hdsort.py:94)     return check_if_installed(cls.hdsort_path)

File c:\Users\system-ses\anaconda3\envs\sci_env\Lib\site-packages\spikeinterface\sorters\basesorter.py:375, in BaseSorter.check_compiled(cls)
    [367](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:367) shell_cmd = f"""
    [368](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:368) #!/bin/bash
    [369](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:369) if ! [ -x "$(command -v {cls.compiled_name})" ]; then
   (...)
    [372](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:372) fi
    [373](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:373) """
    [374](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:374) shell_script = ShellScript(shell_cmd)
--> [375](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:375) shell_script.start()
    [376](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:376) retcode = shell_script.wait()
    [377](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/basesorter.py:377) if retcode != 0:

File c:\Users\system-ses\anaconda3\envs\sci_env\Lib\site-packages\spikeinterface\sorters\utils\shellscript.py:93, in ShellScript.start(self)
     [89](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:89) self._process = subprocess.Popen(
     [90](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:90)     cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=1, universal_newlines=True
     [91](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:91) )
     [92](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:92) with open(script_log_path, "w+") as script_log_file:
---> [93](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:93)     for line in self._process.stdout:
     [94](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:94)         script_log_file.write(line)
     [95](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:95)         if (
     [96](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:96)             self._verbose
     [97](file:///C:/Users/system-ses/anaconda3/envs/sci_env/Lib/site-packages/spikeinterface/sorters/utils/shellscript.py:97)         ):  # Print onto console depending on the verbose property passed on from the sorter class

File <frozen codecs>:322, in decode(self, input, final)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x82 in position 5: invalid start byte
zm711 commented 2 months ago

This was another concern I had. I bet a lot of character based systems will have their own encoding. I mean at this point we could expose this as a run_sorter kwarg with a default to utf-8 and then users can switch to their own encoding if needed? What do you think @alejoe91 ?

Although that doesn't help with this is_installed... maybe even a global kwarg then? not sure where it would best go.

alejoe91 commented 2 months ago

@zm711 we could use chardet to automatically detect the encoding

alejoe91 commented 2 months ago

actually, this might just work! #3439