HugoFara / lwt

Learn languages by reading! A language learning app stemmed from Learning with Texts (LWT).
https://hugofara.github.io/lwt/
The Unlicense
169 stars 19 forks source link

[not a bug?] MeCab can't be used for word splitting and pronunciation unless the language is named exactly "Japanese" #103

Closed alk0 closed 1 year ago

alk0 commented 1 year ago

Describe the bug

It is possible (and may be desirable) to create a duplicate copy of the same L2 (study) language - e.g. one Japanese-English pair with spaces and one with MeCab parsing (my case), or Japanese-English and Japanese-(other language) or whatever. The problem is - MeCab is available only when the name of the language is exactly "Japanese". For word splitting the option to select a method disappears the second the name is changed, for pronunciation it just doesn't work anymore. Maybe I'm doing something wrong, but this is how it looks to me.

To Reproduce

  1. Go to Languages, create a new "Japanese" language (or use an existing one)
  2. Change the name in Study Language "L2": field to "Japanese1" or whatever
  3. Selecting RegExp Word Characters: method becomes unavailable immediately; MeCab-generated pronunciation (the one for Romaniz.: field) stops working.

Expected behavior

MeCab is still available when the name of the language is changed.

Proposal for a fix

As a quick fix: maybe it would be the easiest to check if the language name starts with "Japanese" or contains it instead of equals to?

HugoFara commented 1 year ago

Hi!

Thanks for reaching me, honestly I knew the code behavior would cause some issues, but I did not have straightforward solution at the time I made it. Simply put, the issue is "how should LWT know that the studied language is Japanese?". For the language selection field I'm using language name, but some other parts of the code use dictionary links, etc...

As of today, the only way to have things properly work is to have your "Japanese" language set with "mecab" as regex. You can set any name for Japanese with a generic regex parser.

I haven't decided on a long-term solution yet, I hope my suggestions will be enough for now. Don't hesitate to contact me again if it's still q blocking issue!

alk0 commented 1 year ago

You can set any name for Japanese with a generic regex parser.

Yes, it solves half of the problem, but MeCab-generated spelling is still unavailable in that case, it's rather inconvenient :(

OK, I'll try to come up with some temporary fix for myself maybe. Or maybe I'll just live with it.

HugoFara commented 1 year ago

Hi! I just pushed a new commit that, though not perfect, may help you with your issues. Let's sum it up.

Japanese pronunciation

It is now activated if your parser is "mecab" and (language name is "Japanese" or the translator URL has "ja" as language source). "ja" is Japanese language code, for instance setting https://translate.google.com/?ie=UTF-8&sl=ja&tl=en&text=lwt_term as the translator URL will work.

Japanese parser

Here I think things are not inconvenient. You can always set the regex parser to "mecab" independently from the language name, so that parsing will work. I don't really think a complex move from dev-side is required here :laughing:

Don't hesitate to tell me if your issues persists!