LuteOrg / lute-v3

LUTE = Learning Using Texts: learn languages through reading.
https://luteorg.github.io/lute-manual/
MIT License
486 stars 46 forks source link

500 error when adding book in japanese #125

Open SpaceCow64 opened 10 months ago

SpaceCow64 commented 10 months ago

500 internal server error whenever adding anything when Japanese set as language (mecab installed, tested through settings page).

Adding books works with English no problem, only encounter problems when set to Japanese.

Platform: Windows-10-10.0.19045-SP0

Version: 3.0.11

In docker?: False

Stack trace:

Traceback (most recent call last): File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\flask\app.py", line 1455, in wsgi_app response = self.full_dispatch_request() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\flask\app.py", line 869, in full_dispatch_request rv = self.handle_user_exception(e) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\flask\app.py", line 867, in full_dispatch_request rv = self.dispatch_request() ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\flask\app.py", line 852, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\book\routes.py", line 106, in new book = repo.add(b) ^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\book\model.py", line 58, in add dbbook = self._build_db_book(book) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\book\model.py", line 84, in _build_db_book b = DBBook.create_book(book.title, lang, book.text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\models\book.py", line 114, in create_book tokens = language.parser.get_parsed_tokens(fulltext, language) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\parse\space_delimited_parser.py", line 33, in get_parsed_tokens return self._parse_to_tokens(clean_text, language) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\parse\space_delimited_parser.py", line 66, in _parse_to_tokens self.parse_para(para, lang, tokens) File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\parse\space_delimited_parser.py", line 87, in parse_para m = self.preg_match_capture(pattern, text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\Desktop\lute3\envlute\Lib\site-packages\lute\parse\space_delimited_parser.py", line 42, in preg_match_capture matches = re.finditer(pattern, subject, flags=re.IGNORECASE) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re__init__.py", line 224, in finditer return _compile(pattern, flags).finditer(string) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re__init__.py", line 307, in _compile p = _compiler.compile(pattern, flags) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_compiler.py", line 745, in compile p = _parser.parse(p, flags) ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_parser.py", line 979, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_parser.py", line 460, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_parser.py", line 862, in _parse p = _parse_sub(source, state, sub_verbose, nested + 1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_parser.py", line 460, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_parser.py", line 573, in _parse code1 = _class_escape(source, this) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\noahl\AppData\Local\Programs\Python\Python312\Lib\re_parser.py", line 366, in _class_escape raise source.error('bad escape %s' % escape, len(escape)) re.error: bad escape \p at position 37

To Reproduce

  1. Go to Create new book
  2. Set language to Japanese
  3. input anything to fields
  4. error

installed using python

fanyingfx commented 10 months ago

I found it called get_parsed_tokens() in the space_delimited_parser.py It seems the it doesn't use the proprietary Japanese parser. Can you send a screenshot for your Japanese Language Settings in the Setting-> Languages-> You Japanese settings. Here is my Japanese settings.

image

jzohrab commented 10 months ago

So puzzling, the jp language should have just been disabled if mecab wasn’t set up correctly. There’s already an issue for this … the code seems to be flaky even though it always passes all tests.

SpaceCow64 commented 10 months ago

I found it called get_parsed_tokens() in the space_delimited_parser.py It seems the it doesn't use the proprietary Japanese parser. Can you send a screenshot for your Japanese Language Settings in the Setting-> Languages-> You Japanese settings. Here is my Japanese settings.

image

Changed the "Parse as" to Japanese and it's working now! Thanks! Weird it wasn't set to Japanese as default for me.

jzohrab commented 10 months ago

Weird it wasn't set to Japanese as default for me.

Yes, that's what should have happened. That whole parser check needs to be revisited. It works on my system and in CI, but real people are sometimes getting errors. Thanks for sticking with it. I'm going to leave this issue open in case other people come along as well and run into it.

The other issue is #65 .