Automatic Prioritisation of Materials

Nick3C commented 15 years ago

I think we can substantially improve the ordering of data by making use of the similar Chinese characters (to make it easier to learn things in the same way spacing/delaying of cards templates does).

For example, if the word 社会 is initiated then check for 社 and 会 as facts: 1) is present and at least one card is not 'new' then do nothing 2) if present and all cards are new mark as pyToolkit:WordPart:Priority 3) if not present then mark 社会 as PyTK-NOTE:WordPart:Missing

As mentioned in another thread we should add the priority tag on first run. (and these tags need to be configurable options too)

For longer words will be more complex because it is harder to identify words rather than character. This is not a major problem because if we over-id things they should still be easier than totally unknown materials.

This would work well for phrases and automatically prioritise items that you are more likely to know or really need to know (because you are learning sentences with that word or words with that character in).

batterseapower commented 15 years ago

Hmmmmmmm. Not sure about this. I'm not keen on using tags to mark transient properties of facts/cards because it means you have to spend lots of time housekeeping them.

Perhaps a better solution is a UI where you can see easily those character parts that you are missing (with one-click addition of appropriate facts) or that occur in a learned word but are still new and hence are high priority?

Nick3C commented 15 years ago

Your approach to missing parts is probably better than what I am suggesting. However I think a tag-based approach is still better for cards that are present but not activated. In your approach you would still need to add the high priority tag (at which point they merge with the general mass of tags and it is difficult to see why there were added).

What we really want is a way to turn them off again after a certain period of time (so that what we are really doing is increasing the liklihood of missing new cards becoming due)

How about also using a tag in the style of: pyTK-AutoOff:2009-07-30. We could add code to automatically remove priority-based tags once the date on the tag had passed.

batterseapower commented 15 years ago

If what you're hoping to achieve is that facts about characters you have already learnt as part of another word come due earlier, why can't we just add a function that reschedules cards that we consider high priority a bit earlier? No tags required.

Nick3C commented 15 years ago

Firstly because Damien will get seriously upset with us for messing with scheduling in the way you are suggesting.

Secondly because what we are really trying to do is put new cards that are part of larger units first in the new-item queue (to get them a first single review so they are no longer new) without initiating them automatically (which will cause them to become due, etc). We're not trying to renew them earlier but just to focus the new cards on things that should be easy to learn (because they have links to pre-existing knowledge)

batterseapower commented 15 years ago

I ask again: if thats what we are trying to accomplish, why are we buggering around with tags? It is perfectly possible to mess with the due dates of new cards. See deck.py randomizeNewCards in the Anki source code for details.

The one problem is that Anki may do its own messing at arbitrary points, so we may need more hooks to impose our own priority order.

Nick3C commented 15 years ago

What are we trying to do? We are trying to ensure things that you already know, or things that are easy, are prioritised. These items should be much easier to learn because they are closer to what you already known. In the longer term more points of reference makes it easier to learn things which are less similar. Basically we are trying to make the learning easier, faster, and thus more efficient. This isn't something that applies very well in other languages, but is taking advantage of Chinese's use on Hanzi.

In my view changing the due date is not the best idea (although it might be an acceptable compromise because it is simple). Anki handles randomisation of new cards by randomly assigning due dates to them. Re-randomising or changing to a different sort order would wipe this (from my own studies I do this every 2 or 3 days). I don't see how this could be easily preserved which would mean a user needed to choose between the prioritisation or randomisation. On the other hand perhaps it is an acceptable compromise.

A couple of alternative to discuss.... Option (2) How possible is it to add a custom sort order to the study options page? Something like "Use Pinyin Toolkit's Scheduling algorithm". And we could configure this in the preferences page, for example: prioritise similar words, etc then randomise.

However, that is a much more serious big of coding that the priority approach I was pushing before.

Option (3) Perhaps an alternative would be to hook into the scheduer and do our own post-processing after Anki has sorted cards. It still requires tags though (or some way to invisibly mark the facts). As we are after invisible tags for other purposes (storing auto-generated data in so it can be compared with the field data to determine if auto-generation has occurred) perhaps we should discuss with Damien whether he has a preference for how we modify the database. We could then add our our "PYTK tags" field that was invisible to the user.

batterseapower commented 15 years ago

Yes, the due dates can be changed outside our control. But a button that the user can press to say "impose PyTKs priority order" would mean the user could always reset the order when they wished to. Alternatively if enough hooks were added to Anki we could plug in our own prioritization algorithm that the user could select using the normal mechanisms, but that requires more work.

I still don't see why tags should come into this at all. PyTK can look at a card and determine its priority purely by examining the database of learnt cards and seeing if it occurs within them, yet is itself unlearnt. There is no need for some other process to tag the card, and indeed it is not desirable because whether something is "high priority" is a dynamic property, not a static fact about the card.

Nick3C commented 15 years ago

Ah, I see what you mean. You want to scan the database and then impose a new order. That is a good idea and would work quite well.

My model was based around the idea of "having seen recently" yours is truely based on "what is already known". Your way is much better I think. Let's do it that way. I think it is best to hook into the studyoptions page with a "Sort by related knowledge (Pinyin Toolkit)". This would make it easier to turn off too so that users could flip between what they wanted to learn. I would suggest that the sorting of these new materials is done on the basis of how recently the master-materials have been viewed (i.e. if you recently viewed 社会 then 社 and 会 should be near the top of the new queue).

I have long-term designs on learning by related racials but this requires the python chinese libraries to be added (it has the option to break characters into radicals). This will be immense becuase we can have a show new card option of "Learn by similar characters and related radicals (Pinyin Toolkit)" and have anki add similar characters as well as similar words. It is probably worth bearing this in mind to make the study options infrastructure expandable.

batterseapower commented 15 years ago

OK, sounds good. I might have to submit some patches to Anki to add more hooks to make this way work well, but I think its the right solution.

batterseapower / pinyin-toolkit

Automatic Prioritisation of Materials #115