JMdictProject / JMdictIssues

JMdict Japanese dictionary - lexicographic, etc. issues management
16 stars 1 forks source link

Shortcuts in jmdictdb to kotobank and ngrams #55

Open Marcusjmdict opened 2 years ago

Marcusjmdict commented 2 years ago

Could we perhaps add some shortcuts to the individual jmdictdb entry pages for checking the ngrams for all kanji and readings? Maybe not for everyone but at least for loggged-in editors?

For example in this entry, 雄ねじ, "male screw" I'd like for there to be a "ngrams" button somewhere that would take me directly to the ngrams result url for all those kanji and readings: jwb/ngrams/ngramlookup.cgi?sent=雄ねじ+雄ネジ+雄螺子+おねじ+おすねじ+オネジ+オスネジ

Similar direct links (for the first kanji or reading only, maybe) to kotobank, eijiro, and wadoku searches could also be useful.

I'm editing a lot from my cellphone recently and copy-and-pasting is a real pain in the neck.

JMdictProject commented 2 years ago

It's certainly worth exploring. I'm sure something can be done in that area and I'll discuss it with Stuart. A key thing will be to make it flexible - some of those URLs change a bit. Possibly something like WWWJDIC's [Links] menu (e.g. https://www.edrdg.org/cgi-bin/wwwjdic/wwwjdic?1MDJ%B9%B9%CC%E4%A4%A4) would work.

It will be in the new year before anything can be done.

stephenmk commented 2 years ago

I thought this would be useful too, so I went ahead and implemented it for myself using a bit of javascript. Unfortunately I don't think my code would be very useful as-is on a mobile device, as you would need some way to sideload your own javascript. On desktop browsers this is accomplished with a userscript manager such as Greasemonkey.

This probably goes without saying, but running untrusted code in your browser is dangerous and could result in (for example) having your passwords stolen. That said, I think my code is relatively short and straightforward, so someone might go to the trouble of reading it and vouching for its safety if it seems worthwhile enough.

If it does seem worthwhile, I think it would be fairly simple to adapt it to and serve it officially from edrdg.org too.

Here's a video demonstration:

https://user-images.githubusercontent.com/8003332/163484108-11f7658e-53bc-4772-b9fc-6710e8394f67.mp4

JMdictProject commented 2 years ago

This is another that I took my eye off. What I think is appropriate is to have a relatively simple link with URL parameters consisting of the surface forms. That link would pull up a page in a new tab with the specific links to kotobank, Google, etc. etc. I wouldn't want to try and have all that linking stuff in the database system itself as those sites change their call structures quite often.

Clientside scripts are an interesting alternative. I think I'll try and get the approach above firing, then investigate them some more.

stephenmk commented 2 years ago

I put together a little page (hosted here on github) to demonstrate the link menus built by client-side scripting.

https://stephenmk.github.io/jmdictdb/link_menu_demo.html

I can think of a few drawbacks to this approach:

  1. It requires javascript to work. A small number of people are, under various circumstances, unable or unwilling to run javascript in their web browsers.
  2. The display position of the link menu is "absolute," and on some mobile devices may be cut-off. (Example image). I just noticed this yesterday, but I can fix it if people express interest in this approach.
  3. I understand the maintainers of edrdg.org may be unwilling to adopt and maintain a script that is written in a different style/language than the existing codebase. For what it's worth, I think the code required to implement this approach is fairly short and simple, localizes the link URLs into a single location, and could be added to existing pages with only a couple additional lines of HTML in the page headers:
<link rel="stylesheet" type="text/css" href="path/to/style/jmdictdb_ext_link_menu.css">
<script src="path/to/script/jmdictdb_ext_link_menu.js" defer></script>

I've also been working on a script to make the JMdictDB "updates" page easier to browse. This script is not so simple and I don't expect that it will be of much interest to anyone other than myself, but I've been looking for excuses to practice my javascript skills.

https://stephenmk.github.io/jmdictdb/updates_demo.html?y=2022&m=5&d=3

This script makes the entries on the "updates" page collapsible underneath buttons which display entry summaries. The entries are sorted in descending order by update time, and parent entries are threaded underneath their child entries. I'm not so sure how well this works on mobile devices.

The installable userscript file is available in my github repo if anyone is interested.

stephenmk commented 1 year ago

I wrote a little script to support a new syntax for term combinations on the N-gram lookup page.

E.g. {思う/おもう}{ツボ/壷/つぼ/壺}に{嵌/はま/ハマ}る will expand to the 24 different combinations. I also added a button to the results page to sort descending by counts. (Edit: I should note that this sort button is created by a separate installable script)

Parentheses and curly braces can be used to enclose the groups. Comma, period, semicolon, and / characters can be used as delimiters within the groups.

One pitfall to be mindful of is that the N-gram server will fail if too many terms are queried at once (around 100 I think).

https://user-images.githubusercontent.com/8003332/219902210-4777775f-0de1-4ab4-a5eb-2aaed987ef7a.mp4

robinjmdict commented 1 year ago

Great work. That looks very helpful. I'll be using it. Thanks, Stephen.

@stephenmk You might want to mention that the "Sort Table" feature is from a separate script. I'd suggest incorporating it into the new script. It's very convenient when there are a lot of combinations.

stephenmk commented 1 year ago

Feel free to let me know if you encounter any problems. I've tried to make the scripts compatible with Greasemonkey and Violentmonkey extensions and both Firefox and Chromium-based browsers. I already fixed a couple bugs today, so you may want to update to the latest versions.

The scripts run on separate pages and don't share any code, so I'm a little reluctant to merge them together. In principle I could combine the scripts and determine which code to execute based upon the current page URL, but I think I'd rather keep the files modular and ask users to install both scripts separately.

I could also make a full-fledged browser extension, although I'm not sure if that's needed. There's also the option of adding the scripts directly to edrdg.org. If a copy of a script file is placed on the web server, it can be added to any particular page by including an extra line in the HTML header:

  <head>
    ...
    <script src="relative/path/to/local/script.js" defer></script>
  </head>
robinjmdict commented 1 year ago

Working fine for me (Tampermonkey, Chrome). One minor issue I noticed is that when the "Check to get the most common 10 terms" box is ticked, the spacing between the counts and the percentages reduces to almost zero, making the numbers a little harder to read.

Box ticked Screenshot 2023-02-22 at 00 58 53
Box unticked Screenshot 2023-02-22 at 22 10 55
stephenmk commented 1 year ago

Thanks, I updated that script to ensure the percentage column always has the align="right" attribute set. I guess that page doesn't need percentages, so really I should make the script toggle them off by default.

Dr. Breen suggested that I add NULL components to the new combination syntax to support expressions such as 活 (け)花. I took it a step further and added nested components, i.e. groups within groups. So now we can write cool expressions like these:

Term Count Expression
6 重箱の隅を(〈ようじ/楊枝〉で)〈つつく/ほじくる〉
25 〈巡(り)/廻(り)/めぐり〉〈合(わ)せ/会(わ)せ/あわせ〉
60 〈袖/そで〉〈振(り)/ふり/触(り)〉〈合う/あう〉も〈他生/多生/多少〉の縁

So for example the first expression will expand into these terms:

I've configured it to accept ()()[][] braces for nullable groups, 〈〉<><>{}{} braces for alternatives, and /,、.。;; characters as delimiters within groups. Groups can be nested arbitrarily deep, although I'm not sure there's ever going to be a need to nest more than two together.

I've tested it a little and it seems to work well, but please let me know if you notice any bugs.