FooSoft / yomichan

Japanese pop-up dictionary extension for Chrome and Firefox.
https://foosoft.net/projects/yomichan
Other
1.07k stars 229 forks source link

[feature request] Log history of lookups #93

Open gajewsk2 opened 7 years ago

gajewsk2 commented 7 years ago

Sometimes you don't have time to curate your cards at the time you are looking up words with yomichan. It would be useful to have a feature that would allow you to look at the history of words you have looked up in yomichan, so you can later go back and act on them appropriately.

A logical next step after that would be to log how many times a user has encountered the logged words - which would emphasize the need to do something more substantial with the word (almost like the concept of leeches in anki).

Thoughts?

FooSoft commented 7 years ago

The idea of doing something with words you look up has come up before. Problem is that you don't actually look up individual words with Yomichan, you scan text and display definitions.

gajewsk2 commented 7 years ago

Right, the words alone would be impossible to gather immediately, just as you are saying. But those scanned texts, particularly a list of what has been queried, should provide useful definitions to the user. So if yomichan made a history of what those queries that it had picked up, it would be useful to the user (for the use-cases mentioned above; probably could come up with even more).

If you wanted to minimize logged query noise, yomichan could put some sort of timeout on the actual logging. You might be able to pick up only words that the user was looking for, eg, if the popup has a timeout of 300ms before it shows, yomichan won't log the history until 600ms.

FooSoft commented 7 years ago

I guess what I'm saying though is that looking up ねこ will also flag the query lookup for こ which is noise. At other times you just want definition for ね and not ねこ.

gajewsk2 commented 7 years ago

I'm envisioning that if you were to scan ねこ, the logged query would be ねこ. Not ね and not こ. If either of those were the intent, then the logs would not contain them. But if you were to view the history, see ねこ, based on the contexts of the other queries, you would most likely be able to infer, "Oh that's right, when I looked up ね, it said ねこ".

I'm not sure what interface would be the most conducive to this, but if you had a list of queries in order on the page (like the search page, but no definitions, just queries). You see neko, interact with it, you could easily refresh your memory on what you were querying.

Let's take a more complex example. 録音機能. In this case the word is 録音 and 機能, but 録音機 is what yomichan picks up and would be logged. The user already in this situation gets the wrong definition from yomichan. The user is most-likely already doing the natural thing and scanning the second half of the word by itself: 機能. So now the user has a list in their logs like:

2. 機能
1. 録音機 

The user should be able to hover over the first definition to make a card for 録音 still from 1.

Any lookups done on the logging page would not count towards the log order or log count (basically, you should ignore logging on the logging interface).

FooSoft commented 7 years ago

What would you say would be the benefit of a system like this over say having explicit buttons for something like "save for later"?

gajewsk2 commented 7 years ago

Good question. They would certainly have some overlap. The first advantage of this system would be that the user doesn't have to actually do anything additional to log a query. If I'm trying to crank through a web article as quickly as possible (my case today), I want to see the def and move on. I can take my time to curate cards from a log later.

Besides being less work for users, if a user was to look up a word, think they knew it (therefore ignore add or save for later), only later to realize they did need it - this log could be a useful reference. I know I'm guilty of doing this; telling my brain "oh yeah, I know that" but then scrambling later to remember what was that word again, when the original source isn't in front of you. A log is the only way I can see you narrowing down what you've queried faster, if you didn't have the foresight to mark it.

Having to mark it all is troublesome though. Another benefit would be logging statistics. As mentioned in the first post, if you could tell a user how often they are looking up the same thing, it might clue them into addressing whatever mental stumbles they have for it faster. You would probably only mark 100% matched words as seen though. That wouldn't be too difficult to do, with an object where keys are queries and values of count.

A history and "save for later" function would be have in common, that if the interface to anki was down/not setup yet, they could use either.

The "save for later"'s advantage is the user would have absolutely no additional noise. I can't immediately think of any others.

FooSoft commented 7 years ago

I can see having a "search history" page that opens in a similar way to the search page. It can list the longest inflected matches from scanning ordered by match length and frequency. This could probably be stored in local storage along with options. Would not want to have it in IndexedDB as removing rows from existing databases is pretty buggy.

gajewsk2 commented 7 years ago

longest inflected matches from scanning ordered by match length and frequency

Great! Could you explain what "longest inflected matches" means? And "match length"? Frequency would be the amount of times accessed, if I'm on the same page as you.

EDIT: Never mind, I get it; The longest match essentially.

FooSoft commented 7 years ago

Let's say you are searching 食べませんでした. The inflected match is just that text (食べませんでした), whereas the deinflected (stemmed form) would be 食べる. There could actually be several deinflected forms based on the term (and how far you want to go), so storing the source search string makes the most sense.

Sorting by length is useful since the shorter a stored term is the more likely it's going to be noise.