fabd / kanji-koohii

A web application to help Japanese language learners remember the kanji.
https://kanji.koohii.com
GNU Affero General Public License v3.0
223 stars 21 forks source link

Support additional kanji sequences such as KKLD or RTK Lite #70

Open fabd opened 7 years ago

fabd commented 7 years ago

Since RTK 6th edition support was added in Dec 2014 the website can, in theory, support any arbitrary sequence of kanji for the Study pages.

It just happens that RTK 5th and 6th editions are very similar, and the site has been focused on RTK. With a bit of work it's possible to remove some of the hard coded designs (like the Progress page including RTK 1 and RTK 3).

Related forum thread

Implementation (draft)

Breakdown

Phase 1

First rough implementation: user can select the additional sequence, and then should be able to navigate Study in sequence order.

Phase 2

Involves changing the concept of RTK Edition to a more generic and useful concept of KANJI GOAL.

A controversial change here involves ditching the builtin / hardcoded RTK Lessons. This helps expanding the site's usefulness by being less RTK centric. Not all kanji goals will have lessons and designing lessons is something I can't do. As well as cause potential copyright issues since adding lessons breakdown from a book is a step beyond just supporting a sequence of kanji.

Phase 3

Improve the flow for new users:

Next Steps

faneca commented 7 years ago

Some thoughts:

  1. Would love to see a movie method index (though it's already too late for me to use that).
  2. A link to a page where each index is explained would be a plus for new users.
  3. Goals could have checkpoints (they could be hardcoded or reside in a table that's only queried when loading the "Check Progress" page)... but it'd still suffer from the copyright problem.
fabd commented 7 years ago

A link to a page where each index is explained would be a plus for new users.

That will be handled the equivalent of today's "RTK Edition" page. The page where you pick the sequence is where they will be explained.

Goals could have "checkpoints"

By copyright problem I assume you mean using "lessons" if there are in other books / methods? Are thre lessons in KKLD ?

Actually I've been contemplating removing lessons altogether to simplify the Study page header for mobile, as well as make the site more flexible for the other sequences. Not much use in defaulting to a single lesson sequence of hundreds of characters when we don't have a built in lesson. (eg. RTK Vol. 3 is just lesson 57 on the website...)

But you gave me an idea... Why not just arbitrarily slice up the sequence in smaller chunks. Hence, checkpoints. While they are less meaningful that the ones from Heisig which are based on introducing primitives, they would still work as motivation.

We could let the user pick their desired "checkpoint" threshold. For example: 10, 15, 20. If someone wants to try to study 10 a day, they could use a 10 kanji checkpoint. Or they can pick one based on their pace.

Those checkpoints only make sense when studying in sequence. But then again the point of adding more sequences is so you don't need to jump back and forth anymore (eg. RTK Lite).

faneca commented 7 years ago

Yes, I was talking about the same concept as "lessons", really, while having on my mind a broader one (for some of the methods don't have "lessons" as such). Sorry for the confusion (but glad that gave you an idea ¬_¬; )

jjannone commented 7 years ago

I'd suggest, as an alternative Kanji sequence, the WaniKani sequence: https://www.wanikani.com/api WaniKani's service is reading-only: there is no way to be prompted with the keywords and practice writing. They have no current plans of adding a writing component, so I think many WK users would use koohii to complement their study, so that they have a writing SRS (koohii) and a reading SRS (WK).

fabd commented 7 years ago

@jjannone Would I be authorized to use their sequence? I have no idea about it. Should check out the site. We would need a data sheet with the index > kanji (or UCS code).

jjannone commented 7 years ago

Hi Fabrice,

I can find out about authorization to use the sequence if you’d like; they do provide an API; I included a link to it in my initial post.

The API can download their sequence as JSON, chapter by chapter; below are the first few Kanji in their “chapter 10."

John

{"user_information":{"username":"Jannone","gravatar":"8d530181ecfdabb5bf72869daa6d3231","level":4,"title":"Turtles","about":"","website":"http://jann.one","twitter":"J_J_A_J","topics_count":0,"posts_count":0,"creation_date":1480733262,"vacation_date":null},"requested_information":[{"character":"農","meaning":"farming, agriculture","onyomi":"のう","kunyomi":null,"important_reading":"onyomi","level":10,"nanori":null,"user_specific":null},{"character":"鳴","meaning":"chirp","onyomi":"めい","kunyomi":"な","important_reading":"kunyomi","level":10,"nanori":null,"user_specific":null},{"character":"集","meaning":"collect, gather","onyomi":"しゅう","kunyomi":"あつ.まる","important_reading":"onyomi","level":10,"nanori":null,"user_specific":null},{"character":"酒","meaning":"alcohol","onyomi":"しゅ","kunyomi":"さけ, さか","important_reading":"onyomi","level":10,"nanori":null,"user_specific":null},{"character":"速","meaning":"fast","onyomi":"そく","kunyomi":"はや.い","important_reading":"onyomi","level":10,"nanori":null,"user_specific":null},{"character":"業","meaning":"business","onyomi":"ぎょう","kunyomi":null,"important_reading":"onyomi","level":10,"nanori":null,"user_specific":null},{"character":"院","meaning":"institution","onyomi":"いん","kunyomi":null,"important_reading":"onyomi","level":10,"nanori":null,"user_specific":null}, ...

On Mar 2, 2017, at 11:08 AM, Fabrice D. notifications@github.com wrote:

@jjannone https://github.com/jjannone Would I be authorized to use their sequence? I have no idea about it. Should check out the site. We would need a data sheet with the index > kanji (or UCS code).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fabd/kanji-koohii/issues/70#issuecomment-283696753, or mute the thread https://github.com/notifications/unsubscribe-auth/AYJ_e_FmyCuL4h5diAL5GoXcnG1uyKmHks5rhumCgaJpZM4L9QzN.

fabd commented 7 years ago

Re: WaniKani sequence

Okay I had a quick look. So if I understand there are 60 "levels", equivalent to RTK "lessons"?

However I reviewed a few radicals and couldn't test the kanji but I'm guessing the grid with all the opurple boxes and characters in it is what the sequence is.

This makes me realize I shouldn't ditch the concept of lessons or "levels" since that can be a helpful marker for users to find their way around.

To add WaniKani sequence eventually I need the data in a sheet (csv/tabs) form: index_nr, kanji (or UCS-2 code), lesson. I otherwise have too many things on my plate atm, so I can't invest time figuring out their JSON data. But, there may already be a spreadsheet somewhere.

jjannone commented 7 years ago

So if I understand there are 60 "levels", equivalent to RTK "lessons"?

Correct.

grid with all the opurple boxes and characters in it is what the sequence is.

Yes. lessons or "levels" since that can be a helpful marker for users to find their way around.

Definitely — helps one stay in sync across multiple systems.

To add WaniKani sequence eventually I need the data in a sheet (csv/tabs) form: index_nr, kanji (or UCS-2 code), lesson. I otherwise have too many things on my plate atm, so I can't invest time figuring out their JSON data. But, there may already be a spreadsheet somewhere.

I’ll ask them about licensing, see if there is a spreadsheet, and, ifneedbe, I can parse the JSON.

All the best,

John

On Mar 2, 2017, at 5:26 PM, Fabrice D. notifications@github.com wrote:

Okay I had a quick look. So if I understand there are 60 "levels", equivalent to RTK "lessons"?

However I reviewed a few radicals and couldn't test the kanji but I'm guessing the grid with all the opurple boxes and characters in it is what the sequence is.

This makes me realize I shouldn't ditch the concept of lessons or "levels" since that can be a helpful marker for users to find their way around.

To add WaniKani sequence eventually I need the data in a sheet (csv/tabs) form: index_nr, kanji (or UCS-2 code), lesson. I otherwise have too many things on my plate atm, so I can't invest time figuring out their JSON data. But, there may already be a spreadsheet somewhere.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fabd/kanji-koohii/issues/70#issuecomment-283802749, or mute the thread https://github.com/notifications/unsubscribe-auth/AYJ_e__3HZtWfqBS4MaGmP5q74_iIeBmks5rh0IWgaJpZM4L9QzN.

fabd commented 7 years ago

Sounds good.

This is relatively easy to implement at first and to test, although I am not sure my solution is great.

When the new RTK edition came out, I had to update all SQL queries. The solution I end up using is to refer to a different column name:

CREATE TABLE `kanjis` (
  `ucs_id`       SMALLINT UNSIGNED NOT NULL,
  `keyword`      CHAR(32) NOT NULL DEFAULT '',
  `kanji`        CHAR(1) NOT NULL DEFAULT '',
  `onyomi`       VARCHAR(50) NOT NULL DEFAULT '',
  `idx_olded`    SMALLINT UNSIGNED NOT NULL,
  `idx_newed`    SMALLINT UNSIGNED NOT NULL,
  `lessonnum`    TINYINT UNSIGNED NOT NULL,
  `strokecount`  TINYINT UNSIGNED NOT NULL,
  PRIMARY KEY (`ucs_id`),
  UNIQUE KEY `idx_olded` (`idx_olded`),
  UNIQUE KEY `idx_newed` (`idx_newed`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Either idx_olded or idx_newed is referenced in JOINs. The column itself is optimized as a SMALLINT. However it's not ideal to have an index that covers ~ 2000 of the 20000+ rows in this table.

Still for the time being we could potentially add idx_kkld and idx_rtklite (for example), without impacting performance too much. Each such index will add 2 bytes x 20000 rows. Currently this table is ~700 kb ... compared to the 700 MB stories table it's quite small :blush:

What's neat though, is that a flashcard and story unique identifier is based on user id + UCS code. Hence, if the user switches from one index to another, only the displayed indices are affected. Internally, flashcards and stories references are unaffected, and will be mapped to whatever index is in use.