APIs between Yomichan and the operating system

siikamiika commented 4 years ago

So, Yomichan started as a Rikai replacement with Anki functionality, but over the years, it's accumulated a lot of useful features. I saw this issue https://github.com/FooSoft/yomichan/issues/237 and started thinking that there actually isn't much that's needed outside of Yomichan itself when reading text.

It has come up a few times that Yomichan would be useful as a general text reading tool for text that doesn't originate from websites. While it would be almost impossible to make Yomichan's scan work the same way in all applications on all platforms as suggested in https://github.com/FooSoft/yomichan/issues/146, it's not all that difficult to hook into the text of a specific application such as visual novels or video players and their subtitles.

Currently it's possible to open the search page with a link that also works outside of the browser thanks to https://github.com/FooSoft/yomichan/issues/247, and you can use a hack with Clipboard Inserter (I did the same with WebSockets here) to have all the text coming from an application outside of the browser available for scanning with Yomichan. Discussion about the use case in this issue: https://github.com/FooSoft/yomichan/issues/158

With the background out of the way, what I'm getting at is that there is a lot that could be done with an API in Yomichan that takes in raw Japanese text and inputs it to the search page. I'm aware that many of the features in Yomichan are already far from the initial goal of the extension and this goes even further. If nothing else, I'm writing down the idea for a web application that uses Yomichan as a dictionary.

Here are some ideas for the possible API implementation:

Probably the easiest step would be to enable searching clipboard contents directly from the search page similar to Clipboard Inserter.
- Because one might be copying an entire sentence, one possibility would be to actually parse the sentence (JGlossator, my Chinese parser, mention in https://github.com/FooSoft/yomichan/issues/153), or simply show it below the input box where it could be scanned and clicked.
WebSockets could be used as a generic way to feed text into the search page so that the search page itself connects to a server.
I found this when wondering if the browser itself could run a web server, but that feels more unsafe
Native messaging.

FooSoft commented 4 years ago

That is a pretty interesting approach. I actually made a separate tool for reading books a while ago written entirely in Python, this could probably serve a similar role. Some thoughts:

You could have two data sources; one being from text that is directly pasted in to the extension, and another for remote applications. It would be nice if it were possible to have the concept of "position", so it can be used as an e-reader for books, or whatnot.
I would say that simple AJAX would be superior to web sockets as it is trivial to implement in any language even if third party libs are not available / cannot be used. If you look at my AnkiConnect code you can see a dirt simple Python implementation that would not be as simple if I had to do this with web sockets. The reason why I couldn't just use an existing library in this case, by the way, is that Anki extensions are only able to use features that are distributed with Anki (in case you are curious).
Proper grammar parsing would be pretty rad, but could also turn out to be a huge can of worms. Maybe there is already existing code to do this in JS? Does it require additional data?

oudajosefu commented 4 years ago

I really like the clipboard idea and have actually seen a project in the MIA Japanese community use that to do what I am about to list out but as an Anki add-on (this add-on is currently in beta and is not available to the public for the time being). I must also say that I’ve been having this similarly discussed idea for a while now and many issues/comments in this repository have mentioned various aspects which I’ve thought over that would make this idea possible. Many of such comments were mentioned in @siikamiika’s first post above.

The idea is to make an Electron app that reacts to a specific hot key similar to the copy hotkey (Cmd/Ctrl + C) that then grabs a term from the clipboard and pops up a little browser window similar to Yomichan right near the term. Electron's BrowserWindow api comes with a setPosition() function detailed here to specify the location of the window, and of course there is also a function to bring the focus of the window to the front. @toasted-nutbread previously mentioned that it would be very difficult to detect text across multiple OS, but I feel as though this clipboard idea would solve that issue... for apps that allow text to be copied, which is the majority of them.

He also mentioned in #146:

Based on what I've learned while adding some features recently, this would not be possible. The main reason being that this extension uses standardized APIs and functionality that are core to web browser technologies, and these capabilities wouldn't be available in other contexts. While it could be theoretically possible to still use the same core JavaScript source for a desktop app (see: Electron), other things would be extremely difficult and/or time consuming.

It definitely would be difficult to make the Yomichan code work the same in an Electron app, but given enough time and effort to convert all web browser specific technologies into Electron compatible technology, I wholeheartedly believe that the job can be done, and pretty darn well too, if done to work with the clipboard and utilize Electron's dynamic positioning.

And rethinking this, it would also be possible to use both my idea stated above in combination with @siikamiika’s server/api idea. What I’m thinking is that through Electron and a Yomichan api, a cross-platform app could pop up as described above and make a call to Yomichan in the browser to receive all the results and show them in the app. This would save users from having to reinstall all their dictionaries and reconfigure their settings in the app version of Yomichan.

siikamiika commented 4 years ago

@FooSoft

You could have two data sources; one being from text that is directly pasted in to the extension, and another for remote applications. It would be nice if it were possible to have the concept of "position", so it can be used as an e-reader for books, or whatnot.

I actually hadn't thought of the other use case where you import text into the application and track the position. Many books are in a format where it's hard to bookmark your position unless you some reader application, and I don't see why Yomichan couldn't do that as well given that its predecessor already did it.

For live sources, I think it would make sense to store a history of say 100 last lines, and then have buttons and a hotkey that can be used to go to the next and previous lines. I know many Japanese readers including my own that do this.

I would say that simple AJAX would be superior to web sockets as it is trivial to implement in any language even if third party libs are not available / cannot be used. If you look at my AnkiConnect code you can see a dirt simple Python implementation that would not be as simple if I had to do this with web sockets. The reason why I couldn't just use an existing library in this case, by the way, is that Anki extensions are only able to use features that are distributed with Anki (in case you are curious).

I originally got fascinated with websockets because of my HTPC project where volume and seek sliders would choke my server after HTTP POST spam (which was largely caused by the server using HTTP/1.0 that I changed later). I also realized at some point that I could make a web browser listen to requests like a server using websockets. But for this kind of use case I agree that AJAX should do just fine if you add the required native listener into the extension.

I did wonder why you used raw sockets so thanks for explaining :joy:

Proper grammar parsing would be pretty rad, but could also turn out to be a huge can of worms. Maybe there is already existing code to do this in JS? Does it require additional data?

If we're already using native messaging, a native application like MeCab could be used for this. Pure JS alternatives exist, but the dictionaries can be huge (although Yomichan dictionaries already are). Naive parsing of the sentences with the heuristic that the longest result is probably correct can also get quite far. As a backup you can always scan at some character yourself. But that's just something to think about because Yomichan can already do human-assisted text parsing.

@oudajosefu I thought of clipboard because while it can be used like an API, it's also very simple to use for people who don't have any programming background. Other methods like sockets or AJAX servers are more flexible, but require a few lines of code.

I think the main problem with OS wide popup dictionary is getting the text itself, because most applications where this would be interesting don't let you select the text. There are text hooking solutions that only work for specific applications, and that's why I'm thinking it would make more sense to have an API in Yomichan that can take in text from one of those applications, because it's more probable that one of them does the text hooking better than Yomichan as a dictionary application would ever do.

There's the tradeoff that browsers can't spawn popups outside of the main browser window at specific locations, but if the API would exist in the first place, someone could create a fork in Electron that allows it. I also think it would be possible to hack a web browser into doing this with something like AutoHotkey on Windows that matches a browser popup's title and moves it to the mouse cursor. It's possible to remove the toolbar from a popup at least in Firefox: https://addons.mozilla.org/en-US/firefox/addon/new-window-without-toolbar/

siikamiika commented 4 years ago

I have added a simple clipboard monitor to the search page along with history tracking (popstate/pushState) that works both for user input and text that came from the clipboard. It's disabled by default and has to be enabled by ticking a checkbox each time after the search page is opened.

oudajosefu commented 4 years ago

@siikamiika Works well but is it really necessary to make the user re-enable it every time they open the search page? Maybe it would be better to keep it on if the user chooses it to be like that. Right now, there's no difference with me just hitting Ctrl + V to paste whatever's in my clipboard.

siikamiika commented 4 years ago

@oudajosefu I'm going to make the checkbox option persistent later, because the feature still requires some work to make it more useful. I noticed that on Chrome the clipboard can only be read when you have the page focused, and that kind of defeats the point of the feature because you have to click or switch window to the search page to see the results. I found a workaround (posted as a comment here) but I didn't commit it to master yet because it's not optimal.

When you say the feature works do you mean that your search page updates when you copy text in some application, or do you first have to switch to the search page?

siikamiika commented 4 years ago

@oudajosefu I added a slightly improved version of the workaround to master and now Chrome should be able to update the results when the search page isn't focused. The options are also persisted after opening and closing the search page.

oudajosefu commented 4 years ago

@siikamiika The persistence works well, and after setting the search page hotkey to work globally on my OS, the clipboard function works well too. I don't have to focus on Chrome or even the search page tab for the results to update. Now, every time I copy something in another app and the search page is opened, the input bar updates to the clipboard content and immediately shows the results. This is a nice start to the ideas discussed in this thread and I can't wait for something promising to come out of this. Nice job.

One thing I'd change is just an aesthetic aspect. The checkboxes look quite out of place. Maybe it would look better to make them bigger or more Yomichan-like if that makes sense?

oudajosefu commented 4 years ago

Here's a cool demonstration of what is possible now with these new search page functionalities.

One thing you can notice in the video linked above is that any popups on the search page stay visible when clipboard inserting a new term from another app.

siikamiika commented 4 years ago

@oudajosefu Nice to see the feature being used! That's exactly the kind of thing I intended it for. I'll try to add some form of text parsing or a view of the search query next where you can choose which word to translate to allow reading full sentences without leaving the search page.

Thanks for the feedback, I didn't notice the issue with old popups staying open. Initiating a search should probably close all existing popups on the page.

I'm not sure if I understand what you mean with the checkboxes being out of place, but that's to be expected because I mainly do server side and data related things instead of user interface design or implementation. If you (or someone else) has a vision of a better way to display them, I can try to do that or you can open a PR.

oudajosefu commented 4 years ago

@siikamiika I'll work on a design for the checkboxes. In the meantime, here's another cool representation of what you can do with this new clipboard feature but this time with OCR added in the mix.

siikamiika commented 4 years ago

@oudajosefu Thanks, I'm interested in seeing what they look like. I noticed in Yomichan settings that the persistent storage button has a checkbox inside (never knew you could do that) and also that checkboxes are usually left of the text, which I got the wrong way around.

Your second video illustrates that the possibilities are endless when you have clipboard access. I've actually done something similar with Chinese here, but that's using websockets: https://github.com/siikamiika/scripts/tree/master/netflix-sub-ocr

I've been working on the parser, and while it's not ready to be pushed to master yet because it's missing important features and the mouseover scanning only works because of an inefficient hack, I'm still happy with the results. If you have improvement suggestions, you can open an issue to my fork and mention this thread. Demo video: https://streamable.com/cw2js

oudajosefu commented 4 years ago

@siikamiika I think it would look the best to put the whole image within a circle or square and make the whole shape a button. When clicked, it would turn green. Or better yet, maybe make the colors of the original symbols invert when clicked, just as menubar icons do on macOS, which I'm sure you've seen before.

The parser is a great addition. Some notes though:

The highlighting gets a bit annoying so it would be nice to make the parser react to the settings just as the popups do, so that I can turn Select matched text off.
- Or maybe there could be a separate parser section in the options page with various options that are similar to the popup's options.
It would be cool to be able to add the whole sentence in the parser to anki with the matched word being the expression.

If you have improvement suggestions, you can open an issue to my fork and mention this thread.

How does one do this? Do I open the issue on FooSoft's issues page and mention your fork and this thread? There is no issues page on your fork which I assume is expected given it's a fork.

siikamiika commented 4 years ago

How does one do this? Do I open the issue on FooSoft's issues page and mention your fork and this thread? There is no issues page on your fork which I assume is expected given it's a fork.

It will still take some time until the parser is relevant for anyone checking out foosoft/yomichan master and it's a quite large feature, so I thought it would make sense to keep the discussion about it on the fork before I make a pull request. I guess it's still on topic for this issue though, because it's someting enabled by the clipboard API. It also seems like I hadn't enabled the creation of issues on my fork which is off by default for forks. They're on now.

I think it would look the best to put the whole image within a circle or square and make the whole shape a button. When clicked, it would turn green. Or better yet, maybe make the colors of the original symbols invert when clicked, just as menubar icons do on macOS, which I'm sure you've seen before.

Thanks for the ideas, I'll experiment with them when I get more of the core functionality done.

The highlighting gets a bit annoying so it would be nice to make the parser react to the settings just as the popups do, so that I can turn Select matched text off.

Or maybe there could be a separate parser section in the options page with various options that are similar to the popup's options.

Making the text selection option work with the parser is definitely on my list. I've been working on the scanning performance and parsing improvements now, but I'll do that next. I'm a bit against a separate parser section at least for this setting, because you can already match the search page with settings profiles.

By the way, the branch is currently broken by default unless you change these comments the other way around: ext/bg/js/search-query-parser.js

// const results = await apiTextParse(text, this.search.getOptionsContext());
const results = await apiTextParseMecab(text, this.search.getOptionsContext());

Mecab works much better for text parsing, but the installation process is very tedious and that's why I want to have two options.

oudajosefu commented 4 years ago

@siikamiika Quick question: how did you cycle through your clipboard history in your last demonstration video? If it involves a third-party app, then maybe it would be helpful to add this feature to Yomichan.

siikamiika commented 4 years ago

@oudajosefu That's a feature I added earlier before text parsing. You can just go back and forward in browser history. It's not exclusive to clipboard, all searches push a new history entry.

siikamiika commented 4 years ago

@oudajosefu

any popups on the search page stay visible when clipboard inserting a new term from another app.

After a long break while I was improving the query parser, this is now fixed on master: https://github.com/FooSoft/yomichan/commit/7d9d45ae10302582ce7431bd72ec4f8604dc5e65

Or better yet, maybe make the colors of the original symbols invert when clicked, just as menubar icons do on macOS, which I'm sure you've seen before.

This was also added in https://github.com/FooSoft/yomichan/commit/4ac41283880fdbdf9bf0b82e255004a300d62e8b

oudajosefu commented 4 years ago

So I came across this script for mpv that allows one to see a popup like Yomichan's simply by hovering over a word in the subs. Staying on topic with this thread, I wonder if it's possible to use Yomichan through some API to grab the results that would otherwise appear in the browser and make them appear instead in the mpv script's popup. Because the hovering and popup features are already accomplished through this script, it would be nice to have all the information being grabbed from Yomichan on the fly instead of constantly requesting from online dictionaries. That being said, I'd like it to work similarly to AnkiConnect where it uses a local server to communicate back and forth between the two. A getDefinitionsFromYomichan(word) function could be called from within an mpv script which would return the corresponding dictionary definitions from the created Yomichan server to then be used directly in the mpv popups. It could return the definitions in html and it would be the mpv script's job to render it properly in Yomichan's style.

siikamiika commented 4 years ago

@oudajosefu It's certainly possible, but the actual solution would be quite hacky. It's simple to add an API to Yomichan that provides a server that parses words from a string and returns translations as results. It becomes difficult when you want to display the popup 1:1 in mpv. I read the interSubs code a bit, and it seems to have an old implementation where the subs are rendered using TkInter, but the current one uses PyQt5 that supports WebKit. I'm not sure how one would make it execute JS. You would probably have to reimplement the popup in Qt in some way, either using web technologies or the library's UI toolkit itself.

Alternative solutions include embedding the entire Yomichan into an Electron app that displays popups on demand and communicates with mpv to get the subtitles, and probably the simplest but not the most beautiful way would be to use the existing clipboard monitor but instead make text copying trigger opening a browser popup window that has the same contents as the Yomichan search page (or popup) instead of updating the search page contents. I've talked about the issues related to this approach at the bottom of this comment https://github.com/FooSoft/yomichan/issues/262#issuecomment-544849123.

pigoz commented 4 years ago

An API for Yomichan would be awesome.

As for the window, an mpv script author could make something fairly advanced by displaying the results through libass (and avoid the complexity of going through a windowing toolkit). I did that for my own words lookup script for mpv and it works ok. I ended up implementing mecab and words lookup in a mpv script and it wasn't pretty.

I could replace everything with Yomichan if it provided a way to tokenize a sentence with mecab, and a way to perform dictionary lookups.

siikamiika commented 4 years ago

@pigoz I did a similar thing with my own Japanese parsing project for mpv in the past here. It's probably already out of sync with the server's API though. I found libass quite difficult to work with, but I believe you have a lot more experience in it.

The newest version of Yomichan actually added text parsing functionality where MeCab is one of the parsers that can be used. I'm curious what other devs can come up with if there's an API in Yomichan that exposes both the tokenizer and the dictionary. It could be done by extending the native messenger for MeCab here https://github.com/siikamiika/yomichan-mecab-installer.

pigoz commented 4 years ago

I did a similar thing with my own Japanese parsing project for mpv in the past here.

Seems very similar to mine https://github.com/pigoz/mpv-nihongo/blob/master/src/jplookup.ts Not using Lua is definitely a plus though. I could probably work on making a nice mpv script if there's a stable Yomichan API.

As for export to Anki through AnkiConnect, I think it's still better to bake it in an NIH mpv plugin, that way you can cut an audio and video sample (I do so in my project). I added mpv commands to mpv to fetch the current subtitle's start and end time for this reason.

https://github.com/siikamiika/yomichan-mecab-installer

By the way, I've been using this dictionary https://github.com/neologd/mecab-ipadic-neologd with good results.

siikamiika commented 4 years ago

@pigoz Cool! I'll let you know when there's something available.

I agree that it makes sense to use Anki export separately, because if an API is used, a lot of context required for the notes is missing unless they're filled with compatible but empty data, and the note format also tries to mimic what you see in the popups. The benefit from having the API is that you have your dictionaries in one place.

The neologd flavor of ipadic is actually in the same format, but I didn't include it as a default for the installer because it's huge. Do you know if it's available as a binary somewhere? I could probably also add a compile step to the install script, but it's quite barebones for now.

toasted-nutbread commented 4 years ago

Alternative solutions include embedding the entire Yomichan into an Electron app that displays popups on demand and communicates with mpv to get the subtitles

May be possible to create a popup window using chrome.windows and just position that. Would still require the browser to be running, and who knows how that interacts with things like video players if they are fullscreen or always on top, but it would avoid the need of creating and installing a second app.

oudajosefu commented 4 years ago

@toasted-nutbread If you do use the browser for an additional popup window, how would you manage to hook into the subtitles of video players like mpv? Currently I use an mpv script I made to copy the currently displayed subtitles to the clipboard and open Yomichan's search page with a hotkey, but mpv does not let you copy individual words from the subtitles or even highlight them. Because of this, I think it would be best to use an mpv script for both displaying the results and parsing individual words as @pigoz made. A Yomichan API would be greatly appreciated so as to make the searching and dictionary maintenance much easier.

siikamiika commented 4 years ago

@toasted-nutbread Nice, seems to work on Chrome just fine, and I also found what https://addons.mozilla.org/en-US/firefox/addon/new-window-without-toolbar/ uses to hide the toolbar: vLss6NaJlo

@oudajosefu I think it's good to have different options for different needs. The dictionary API would be great for making specialized applications where Yomichan's dictionaries and text parsing can be reused, and a browser popup windows are simple to get up and running but might not integrate well with every use case.

To use Yomichan's browser popups with mpv you would need an mpv script that can detect the character that's clicked or otherwise interacted with in the video player, and send the subtitle line starting from that character to Yomichan either by copying it to the clipboard and letting Yomichan handle the rest using clipboard monitor and something that detects the mouse position (in the native messenger), or using a new API for that.

pigoz commented 4 years ago

The neologd flavor of ipadic is actually in the same format, but I didn't include it as a default for the installer because it's huge. Do you know if it's available as a binary somewhere? I could probably also add a compile step to the install script, but it's quite barebones for now.

I had to install it from source. I don't think there's a prebuilt one available. As long as Yomichan can use a self-installed mecab from the $PATH I don't think it's a huge issue though.

siikamiika commented 4 years ago

@pigoz MeCab itself will be used from $PATH on Linux/Mac and %PROGRAMFILES(X86)%\MeCab\bin\mecab.exe on Windows, but the dictionaries are looked up by name from <yomichan-mecab-installer>/data/<dictionary-name>. Currently ipadic, ipadic-neologd and unidic-mecab-translate are supported. So you should be fine if you link the dictionary data there under ipadic-neologd.

a4jp-com commented 2 years ago

Can Yomichan be used with any email programs in Windows 11?

toasted-nutbread commented 2 years ago

A long time ago, I looked into seeing if Yomichan could work as an addon to Thunderbird with no success. So as far as integration with a native application, probably not. You can set up Yomichan to show a search window when you copy Japanese text to the clipboard, so technically you could do it that way, but it's not an optimal workflow. A web client is the best way to get full Yomichan integration.

FooSoft / yomichan

APIs between Yomichan and the operating system #262

It would be cool to be able to add the whole sentence in the parser to anki with the matched word being the expression.