Implement several new export sort methods (for discussion)

This is a draft based on the discussion in #107 and #108. (Please feel free to totally tear this up.)

I've implemented several new potential sort methods:

Sorting by note id. (#108) (note_id)
Sorting by the last field. (field_last)
Sorting tags or the first field "naturally"/"numerically" (such that 1 < 2 < 10, and chapter1.1 < chapter1.2 < chapter1.10 < chapter2). (field1_numeric, tag_numeric).
Sorting by the sort field (as specified by sortf in the respective note model(s)). (browser_sort_field)

For slightly more extensive descriptions and analyses of drawbacks, see below.

They're all independent, though to some extent they could be combined (in particular 3 can be combined with 2 or 4). I'm implementing them in a single PR, because they're all motivated by the same goal of allowing the creators of a deck to control the order in which the learner sees new notes and to avoid an excessively fragmented discussion.

I don't think that they should necessarily all be introduced, so please feel completely free to shoot down any or all of them.

Background

As @sudomain had brought up in #107, it's sometimes crucial for new cards/notes to be seen in a specific order, because the "later" notes build on the "earlier" ones. This is both true when the Anki deck "accompanies" an external learning resource (e.g. a course or a book) and when it's used for standalone learning. In other cases forcing the order is not essential, but might still be valuable.

If the deck configuration uses the "Show new cards in order added" option (Deck Options > New cards > Order or, alternatively, specified in a deck.json in the relevant config in deck_configurations, by the order (0 means in random order, 1 in order added)), then any new cards will be shown in the order in which they're in the deck.json (since CrowdAnki adds the cards in order). Hence, by controlling the export order one can control the order in which learners see new cards.

(Note: I haven't checked how this interacts with sub-decks in the beta "Anki 2.1 scheduler" — i.e. whether CrowdAnki imports sub-decks before or after the parent deck. In any case, though, the order within a subdeck will be as exported.)

Shortcomings of current sort methods

(This is not intended to be a criticism of the current methods — AFAICT they were mainly created to ensure a stable export with no spurious diffs and to better inter-operate with tools like BrainBrew, which they do admirably.)

String comparison

field1, field2 and tag compare their contents using a string comparison, such that, say, "2" > "10".

If we wanted to have an "index"/"sort" field to specify the desired order, then simply using consecutive, unpadded integers wouldn't work, since "10" would come before "2" etc. Padding the integers with leading zeros, to the same length (e.g. "002", "100") is a solution, but it's non-obvious, slightly ugly and very inconvenient (ensuring consistency when the order of magnitude of the number of cards changes would be particularly annoying). One could use an additional tool (potentially BrainBrew and a spreadsheet, or a dedicated Anki addon), but that's suboptimal.

The same holds for tags — it'd be nice to be able to just tag cards as, say, deck_name::sectionX.Y.Z, where X, Y and Z are integers, without having to pad the integers with the correct number of zeros, guessed in advance (renaming tags in Anki can be a pain...).

Position of fields

As @sudomain pointed out, it can be annoying for the index field to take the highly prominent first (or second) position — these should be reserved for the actual content of the note.

Non-sorting tags

For instance, if we have both "sorting" tags (say deck_name::chapterX.Y or deck_name::sectionX.Y) and "non-sorting" ones (say deck_name::hard, deck_name::extra_material, deck_name::verb, deck_name::noun, deck_name::europe etc.), we don't want the presence or absence of the non-sorting tags to influence the sort order.

If the "non-sorting" tags come after the "sorting" tags alphabetically, then they'll be placed later in the anki_object.tags concatenated string and hence have negligible influence, but ensuring that that's always the case might be annoying.

One can work around this by futher namespacing the sorting and non-sorting tags to enforce correct alphabetical order — having say deck_name::index::sectionX.Y and deck_name::topic::verb — but it's a bit ugly (and non-obvious, in advance), so it's still not a perfect solution.

Ease of use

Having to tag or index all notes can be rather inconvenient.

Analysis of suggested sort methods

Note id

FWIW it seems that Anki's internal note sort order is by id (currently — AFAIK you shouldn't rely on any sort order, if it's not explicitly specified in the SQL query), even if you manually change the id of a note, so note_id has (currently) exactly the same effect as the none sort method.

Advantages:

Easy to get the right order in the first pass (just create the notes in the correct order).
With the Change Card Creation Times suggested by @sudomain, one can edit the note id, hence changing the order.
Easy to understand — Anki's new cards' sort order in the creator's deck maps onto the sort order in the learners' deck.

Disadvantages:

CrowdAnki obviously does not export the note_id, meaning that different people will have different note_ids and most importantly, changes to the order of note_ids won't be shared between contributors, meaning that only one person would be able to change not order...

@sudomain has a semi-solution, but it's still non-ideal.
Editing the note_ids is impossible without an add-on.
Changing note_ids feels a bit brittle.
Currently, it's equivalent to the "none" sort method.

However, since modifications to CrowdAnki's internals or Anki might change that, it might be worth having a sort order that is explicitly by note id.

Last field

The obvious advantage is that a potential "index" field will no longer be in a prominent position. Such an index field can be simply appended to all note types used in the deck.

Sorting by field_last is unlikely to have any applications other than for an index field, though.

"Natural/numeric sort"

As described above it allows sorting fields containing numbers "naturally", such that 1 < 2 < 10, and chapter1.1 < chapter1.2 < chapter1.10 < chapter2.

The idea is that the string is split into a list of alternating strings containing no numeric characters and non-negative integers. Python happily sorts lists (even lists of variable length, provided that all corresponding fields are comparable), so this allows us to sort in a way where integers are compared as integers, not as strings.

(Note that since we're dealing purely with non-negative integers, not decimals, "1.5" < "1.10" etc., which in the context of strings using schemas like sectionX.Y makes perfect sense, but might be confusing under some other conditions.)

It's very similar to the "version" sort in GNU ls and sort (ls -v and sort -V), which use the gnulib filevercmp function.

Obviously, this type of sort can (and if decided to be generally a good idea, probably should) be combined with more of the existing and new methods (field2, field_last, sort_field).

I'm extremely unhappy about the names (field1_numeric, tag_numeric) and would welcome any alternative suggestions.

Sorting by the "sort field"

The advantage is that this allows sorting the export, by the field that appears in the Sort Field column of the browser, and hence the export order can be the same as the default sort order in the browser.

It feels neat, but a bit gimmicky, so I'm not sure if it's worth introducing.

I'm also rather unhappy about the name (browser_sort_field...).

Wishlist

Sorting by only a subset of tags.

As described above there's the issue of "non-sorting tags" potentially interfering with sorting tags. None of the above proposals deal with this. I can't think of any way of solving it that either doesn't involve hardcoding keywords (such as section or chapter), hence losing flexibility, or considerably overhauling the entire system (having per-deck sorting, with configurable keywords).

Doubts

Tests

Should the export sort tests be more discriminating? The currently existing tests don't really distinguish between field1, field2 and tag (so for instance if somebody changed one of relevant indices in note_sorter.py, it wouldn't be caught). The newly added tests further don't distinguish between browser_sort_field, field_last and field1 etc.

However, the tests will catch many forms of breakage, so it might not be worth over-complicating the test lists. Also, the most likely source of breakage is Anki's internals changing, which wouldn't be caught here, anyway...

Flag

The current sort by flag feels rather weird. Notes can have a flag (it's present in the database in the notes table), but Anki's source states that they're "not currently exposed", CrowdAnki doesn't export or import them, and I don't think that they even influence the result of a "flag" sort...

(The commonly encountered per-card flags have no bearing on the note flags.)

Naming

As mentioned above, I'd greatly welcome alternative names for most of the proposed sort methods.

Thanks for doing this. One note: if sorting by the first and last field are options, wouldn't allowing the user to select any field be better? (more general, futureproof...)

Thanks for doing this. One note: if sorting by the first and last field are options, wouldn't allowing the user to select any field be better? (more general, futureproof...)

Yes, in many ways. I avoided it because:

It'd involve a slight overhaul of the sorting and/or config-parsing logic, to allow for arbitrary fieldN, which would result in some added complexity.

OTOH having a more "programmatic" approach might also be better for the "numerical sorting" methods — sorting methods could take arguments, which could be, say, field number or sorting method ("purely string" or "natural/numerical").

Doing this "properly" might be tricky from a UI perspective — do we want to stick with the current approach of the config ui parsing a comma-separated list of methods (methodA,methodB), but with optional "suffixes" for each method (in effect implementing a DSL-lite)? Or would it be better to have a form with drop-down options (with editing meta.json directly remaining an option for "power-users")?
Almost all note types are guaranteed to have at least two fields, so the existing field1 and field2 methods didn't have to worry about what to do if a note doesn't have the Nth field. Similarly, all note types will have a last field. Since the sort methods are currently global (for all decks and all note types) we can't really rely on users not choosing an invalid field number.

I can see two approaches to the issue of missing fields:
1. Fall back to the last existing field.
2. Fall back to an empty string.
It didn't seem that valuable — I'd expect an index or other interesting sort field to be either the first or the last, and not Nth for arbitrary N, especially since the value has to be set globally (for all note types). (If somebody really wanted to sort by the Nth field, then the proposed browser_sort_field would allow them to, by changing sortf, with the added flexibility (or inconvenience, depending on perspective :)) of it being per-notemodel.)

I'll think about how one could best implement this, some more, though.

👏 bloody good stuff, mate. As always your PRV is excellent! (Passion Results Verbosity! 😁)

You capture the exact essence of why the sort methods are not the perfect "by field X" that they could be. However, one thought that may help you get there: maybe you are looking in the wrong place 🤔 (why does the Github thinking emoji have a frown which gives a connotation of dissatisfaction? 😠 I just want a thinking face! 😅)

(This is not intended to be a criticism of the current methods — AFAICT they were mainly created to ensure a stable export with no spurious diffs and to better inter-operate with tools like BrainBrew, which they do admirably.)

Sort methods in the config of CrowdAnki works fine to achieve this default stable setup, just as you say 👍 and adding methods to here such as "Filter by the field names 'X'" is impossible, as we do not know what is to be exported. In my mind these options should be presented as default options only, and the user can be given more comprehensive sort options in the Export Config Window 😁 at that point CrowdAnki knows exactly what cards are to be exported, and thus can calculate which fields are shared on all, and give the options to the user. If there is a more functional sort method setup which can be given a lambda, then I think your dream of a perfect sort option can be achieved 👍

(Something I intended to get back to in the Export Config Window, but time is a cruel mistress 😅)

Anyways, I hope my thoughts are helpful in any way. Thank you for the great work and thoughts, as always 👏

Very good work and in-depth discussion. I'm partial to a sort by note id since it seems intuitive, though I understand the drawbacks. The only thing that bothers me about a field sort method (whether field1, 2, last, or sort) is if you have a CrowdAnki deck covering a large document such as the Anki manual and you then want to add cards for new users in the middle of the deck. Using a padded-zero approach you may have cards with numbers 0001 through 1000. If you wanted to add new cards in the area of 0200, you'd have to use another system such as 0200a or 0200.1 to avoid having to manually change the existing cards 0201 through 1000. Does anyone know of an advanced regex or an external tool that would be able to handle such a repetitive task?

As described above there's the issue of "non-sorting tags" potentially interfering with sorting tags. None of the above proposals deal with this. I can't think of any way of solving it that either doesn't involve hardcoding keywords (such as section or chapter), hence losing flexibility, or considerably overhauling the entire system (having per-deck sorting, with configurable keywords).

What about assuming every tag is a non-sorting tag and allowing users to whitelist (via regex?) strings they wish to use as a sorting tag? For example, if they use the tag_numeric export option, they can also use an option such as sort_tag=chapter::* or sort_tag=section::*

One can work around this by futher namespacing the sorting and non-sorting tags to enforce correct alphabetical order — having say deck_name::index::sectionX.Y and deck_name::topic::verb — but it's a bit ugly (and non-obvious, in advance), so it's still not a perfect solution.

This is gave me an idea unrelated to CrowdAnki, so feel free to ignore all of the following text. If you use tags to organize and you pair the document name with the section name (e.g. python3.9_tutorial::section4.1) you will end up with a new tag for every section of every document you make cards from. This seems like linear (linearithmic?) growth in the number of tags and eventually the tag list in the card browser would start to feel crowded. If we make tags atomic using tags such as python3.9_tutorial and section4.1, we retain the ability to find cards from section 4.1 of the python tutorial, with probably far fewer tags. The python3.9_tutorial tag will apply to every note based on the tutorial, while tags of the form sectionX[.Y][.Z] would apply to any document that uses such a numbering system. I've found such numbering systems are very common in textbooks on any subject (Chapter 1.0, 1.1, 1.2, ...) and technical docs (Section 4.1, 4.2, 4.2.1, ...). Using this system instead, you'd have 1 new tag for every document. If you already have notes that use sectionX[.Y][.Z] tags, many of them will reused. </blogpost>

clap bloody good stuff, mate. As always your PRV is excellent! (Passion Results Verbosity! grin)

I often get too verbose (but writing things out helps me clear out my own thoughts)...

Sort methods in the config of CrowdAnki works fine to achieve this default stable setup, just as you say +1 and adding methods to here such as "Filter by the field names 'X'" is impossible, as we do not know what is to be exported. In my mind these options should be presented as default options only, and the user can be given more comprehensive sort options in the Export Config Window grin at that point CrowdAnki knows exactly what cards are to be exported, and thus can calculate which fields are shared on all, and give the options to the user. If there is a more functional sort method setup which can be given a lambda, then I think your dream of a perfect sort option can be achieved +1

Yeah, that'd definitely be extremely useful! It might not work too well with snapshots (particularly automated ones), though, so it'd probably be convenient to be able to also set this non-interactively (i.e. not during the actual export), per-deck. However, setting it in advance would obviously have the disadvantage that CrowdAnki couldn't guarantee that you wouldn't add a note model that didn't have a given field, in the future. :)

Anyways, I hope my thoughts are helpful in any way. Thank you for the great work and thoughts, as always clap

They are! Thanks, as always for the feedback and the words of encouragement! :)

Very good work and in-depth discussion. I'm partial to a sort by note id since it seems intuitive, though I understand the drawbacks. The only thing that bothers me about a field sort method (whether field1, 2, last, or sort) is if you have a CrowdAnki deck covering a large document such as the Anki manual and you then want to add cards for new users in the middle of the deck. Using a padded-zero approach you may have cards with numbers 0001 through 1000. If you wanted to add new cards in the area of 0200, you'd have to use another system such as 0200a or 0200.1 to avoid having to manually change the existing cards 0201 through 1000. Does anyone know of an advanced regex or an external tool that would be able to handle such a repetitive task?

@ohare93's amazing BrainBrew could provide a solution. It (among other things) allows converting a CrowdAnki deck.json into a CSV (or set of CSVs) and back again.

You could export the deck with CrowdAnki, convert the deck.json into a CSV (with BrainBrew), open the CSV in a spreadsheet, shift all the indices by one (e.g. by moving the relevant cells etc.), convert back to deck.json and re-import. It's a bit involved but certainly faster than incrementing the indices manually...

Alternatively, making a (re-)indexing Anki add-on should be straightforward. I'll add it to my "to try" list.

What about assuming every tag is a non-sorting tag and allowing users to whitelist (via regex?) strings they wish to use as a sorting tag? For example, if they use the tag_numeric export option, they can also use an option such as sort_tag=chapter:: or sort_tag=section::

Yeah, that'd work! It'd require allowing the method sort types to take an argument (but as discussed above, that'd be useful in other ways as well (general fieldN sort and a neater approach for the "numerical" sort type)).

This is gave me an idea unrelated to CrowdAnki, so feel free to ignore all of the following text. If you use tags to organize and you pair the document name with the section name (e.g. python3.9_tutorial::section4.1) you will end up with a new tag for every section of every document you make cards from. This seems like linear (linearithmic?) growth in the number of tags and eventually the tag list in the card browser would start to feel crowded. If we make tags atomic using tags such as python3.9_tutorial and section4.1, we retain the ability to find cards from section 4.1 of the python tutorial, with probably far fewer tags. The python3.9_tutorial tag will apply to every note based on the tutorial, while tags of the form sectionX[.Y][.Z] would apply to any document that uses such a numbering system. I've found such numbering systems are very common in textbooks on any subject (Chapter 1.0, 1.1, 1.2, ...) and technical docs (Section 4.1, 4.2, 4.2.1, ...). Using this system instead, you'd have 1 new tag for every document. If you already have notes that use sectionX[.Y][.Z] tags, many of them will reused.

That's really interesting and a good point! It's swayed me considerably away from my previous full-hearted support for tag namespacing, since you're right that it leads to a considerable growth in number of tags (which Anki is regrettably not great at handling, at least not without addons)!

Stvad / CrowdAnki