WorksApplications / SudachiPy

Python version of Sudachi, a Japanese tokenizer.
Apache License 2.0
392 stars 50 forks source link

SudachiPy doesn't work with Windows with "OSError: symbolic link privilege not held" #107

Closed chezou closed 3 years ago

chezou commented 4 years ago

SudachiPy doesn't work with Windows since Windows requires administrator privilege for creating symlink. It'd be nice if we could avoid using symlink for dictionary setting.

λ  python
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from sudachipy import tokenizer
>>> from sudachipy import dictionary
>>> tokenizer_obj = dictionary.Dictionary().create()
Traceback (most recent call last):
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 55, in create_default_link_for_sudachidict_core
    dict_path = Path(import_module('sudachidict').__file__).parent
  File "C:\Python36\Lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'sudachidict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\dictionary.py", line 37, in __init__
    self._read_system_dictionary(config.settings.system_dict_path())
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 106, in system_dict_path
    dict_path = create_default_link_for_sudachidict_core(output=f)
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 71, in create_default_link_for_sudachidict_core
    dict_path = set_default_dict_package('sudachidict_core', output=output)
  File "C:\Users\chezo\source\sudachi-test\.venv\lib\site-packages\sudachipy\config.py", line 47, in set_default_dict_package
    dst_path.symlink_to(src_path)
  File "C:\Python36\Lib\pathlib.py", line 1325, in symlink_to
    self._accessor.symlink(target, self, target_is_directory)
  File "C:\Python36\Lib\pathlib.py", line 393, in wrapped
    return strfunc(str(pathobjA), str(pathobjB), *args)
OSError: symbolic link privilege not held
sorami commented 4 years ago

Thank you for the report!

We will think about how we should manage the dictionary; one idea we have is to use a config file instead of symlink.

chezou commented 4 years ago

Good to know. If you need a test for Windows, I'd be happy to help you!

chezou commented 4 years ago

Eventually, I've found a way to create a symlink with user permission on Windows 10.

As of Python 3.8, os.symlink() supports to create a symlink with unprivileged account if Developer Mode enabled. See also the note of: https://docs.python.org/3/library/os.html#os.symlink

Here is the result of the example with Python 3.8 on Windows 10.

C:\Users\chezo\source\sudachi-test                                                                            
λ  python                                                                                                     
Python 3.8.0 (tags/v3.8.0:fa919fd, Oct 14 2019, 19:37:50) [MSC v.1916 64 bit (AMD64)] on win32                
Type "help", "copyright", "credits" or "license" for more information.                                        
>>> from sudachipy import tokenizer                                                                           
>>> from sudachipy import dictionary                                                                          
>>>                                                                                                           
>>> tokenizer_obj = dictionary.Dictionary().create()                                                          
>>>                                                                                                           
>>> mode = tokenizer.Tokenizer.SplitMode.C                                                                    
>>> [m.surface() for m in tokenizer_obj.tokenize("国家公務員", mode)]                                              
['国家公務員']                                                                                                     
>>>                                                                                                           
>>> mode = tokenizer.Tokenizer.SplitMode.B                                                                    
>>> [m.surface() for m in tokenizer_obj.tokenize("国家公務員", mode)]                                              
['国家', '公務員']                                                                                                 
>>>                                                                                                           
>>> mode = tokenizer.Tokenizer.SplitMode.A                                                                    
>>> [m.surface() for m in tokenizer_obj.tokenize("国家公務員", mode)]                                              
['国家', '公務', '員']                                                                                             
>>>                                                                                                           
>>> m = tokenizer_obj.tokenize("食べ", mode)[0]                                                                 
>>>                                                                                                           
>>> m.surface() # => '食べ'                                                                                     
'食べ'                                                                                                          
>>> m.dictionary_form() # => '食べる'                                                                            
'食べる'                                                                                                         
>>> m.reading_form() # => 'タベ'                                                                                
'タベ'                                                                                                          
>>> m.part_of_speech() # => ['動詞', '一般', '*', '*', '下一段-バ行', '連用形-一般']                                        
['動詞', '一般', '*', '*', '下一段-バ行', '連用形-一般']                                                                    
>>>                                                                                                           
>>>                                                                                                           
>>> # Normalization                                                                                           
...                                                                                                           
>>> tokenizer_obj.tokenize("附属", mode)[0].normalized_form()                                                   
'付属'                                                                                                          
>>> # => '付属'                                                                                                 
... tokenizer_obj.tokenize("SUMMER", mode)[0].normalized_form()                                               
'サマー'                                                                                                         
>>> # => 'サマー'                                                                                                
... tokenizer_obj.tokenize("シュミレーション", mode)[0].normalized_form()                                             
'シミュレーション'                                                                                                    
izziiyt commented 4 years ago

@chezou Good workaround ! We'll resolve this problem for Windows OS and python3.5 but if someone wants to use SudachiPy just now, follow this way. Thanks @chezou .

eiennohito commented 3 years ago

0.6.0 does not use symlinks anymore