Open aplaice opened 3 years ago
Thanks for doing this. One note: if sorting by the first and last field are options, wouldn't allowing the user to select any field be better? (more general, futureproof...)
Thanks for doing this. One note: if sorting by the first and last field are options, wouldn't allowing the user to select any field be better? (more general, futureproof...)
Yes, in many ways. I avoided it because:
It'd involve a slight overhaul of the sorting and/or config-parsing logic, to allow for arbitrary fieldN
, which would result in some added complexity.
OTOH having a more "programmatic" approach might also be better for the "numerical sorting" methods — sorting methods could take arguments, which could be, say, field number or sorting method ("purely string" or "natural/numerical").
Doing this "properly" might be tricky from a UI perspective — do we want to stick with the current approach of the config ui parsing a comma-separated list of methods (methodA,methodB
), but with optional "suffixes" for each method (in effect implementing a DSL-lite)? Or would it be better to have a form with drop-down options (with editing meta.json
directly remaining an option for "power-users")?
Almost all note types are guaranteed to have at least two fields, so the existing field1
and field2
methods didn't have to worry about what to do if a note doesn't have the Nth field. Similarly, all note types will have a last field. Since the sort methods are currently global (for all decks and all note types) we can't really rely on users not choosing an invalid field number.
I can see two approaches to the issue of missing fields:
Fall back to the last existing field.
Fall back to an empty string.
It didn't seem that valuable — I'd expect an index or other interesting sort field to be either the first or the last, and not Nth for arbitrary N, especially since the value has to be set globally (for all note types). (If somebody really wanted to sort by the Nth field, then the proposed browser_sort_field
would allow them to, by changing sortf
, with the added flexibility (or inconvenience, depending on perspective :)) of it being per-notemodel.)
I'll think about how one could best implement this, some more, though.
👏 bloody good stuff, mate. As always your PRV is excellent! (Passion Results Verbosity! 😁)
You capture the exact essence of why the sort methods are not the perfect "by field X" that they could be. However, one thought that may help you get there: maybe you are looking in the wrong place 🤔 (why does the Github thinking emoji have a frown which gives a connotation of dissatisfaction? 😠 I just want a thinking face! 😅)
(This is not intended to be a criticism of the current methods — AFAICT they were mainly created to ensure a stable export with no spurious diffs and to better inter-operate with tools like BrainBrew, which they do admirably.)
Sort methods in the config of CrowdAnki works fine to achieve this default stable setup, just as you say 👍 and adding methods to here such as "Filter by the field names 'X'" is impossible, as we do not know what is to be exported. In my mind these options should be presented as default options only, and the user can be given more comprehensive sort options in the Export Config Window 😁 at that point CrowdAnki knows exactly what cards are to be exported, and thus can calculate which fields are shared on all, and give the options to the user. If there is a more functional sort method setup which can be given a lambda, then I think your dream of a perfect sort option can be achieved 👍
(Something I intended to get back to in the Export Config Window, but time is a cruel mistress 😅)
Anyways, I hope my thoughts are helpful in any way. Thank you for the great work and thoughts, as always 👏
Very good work and in-depth discussion. I'm partial to a sort by note id since it seems intuitive, though I understand the drawbacks. The only thing that bothers me about a field sort method (whether field1, 2, last, or sort) is if you have a CrowdAnki deck covering a large document such as the Anki manual and you then want to add cards for new users in the middle of the deck. Using a padded-zero approach you may have cards with numbers 0001
through 1000
. If you wanted to add new cards in the area of 0200
, you'd have to use another system such as 0200a
or 0200.1
to avoid having to manually change the existing cards 0201
through 1000
. Does anyone know of an advanced regex or an external tool that would be able to handle such a repetitive task?
As described above there's the issue of "non-sorting tags" potentially interfering with sorting tags. None of the above proposals deal with this. I can't think of any way of solving it that either doesn't involve hardcoding keywords (such as section or chapter), hence losing flexibility, or considerably overhauling the entire system (having per-deck sorting, with configurable keywords).
What about assuming every tag is a non-sorting tag and allowing users to whitelist (via regex?) strings they wish to use as a sorting tag? For example, if they use the tag_numeric
export option, they can also use an option such as sort_tag=chapter::*
or sort_tag=section::*
One can work around this by futher namespacing the sorting and non-sorting tags to enforce correct alphabetical order — having say deck_name::index::sectionX.Y and deck_name::topic::verb — but it's a bit ugly (and non-obvious, in advance), so it's still not a perfect solution.
This is gave me an idea unrelated to CrowdAnki, so feel free to ignore all of the following text. If you use tags to organize and you pair the document name with the section name (e.g. python3.9_tutorial::section4.1
) you will end up with a new tag for every section of every document you make cards from. This seems like linear (linearithmic?) growth in the number of tags and eventually the tag list in the card browser would start to feel crowded.
If we make tags atomic using tags such as python3.9_tutorial
and section4.1
, we retain the ability to find cards from section 4.1 of the python tutorial, with probably far fewer tags. The python3.9_tutorial
tag will apply to every note based on the tutorial, while tags of the form sectionX[.Y][.Z]
would apply to any document that uses such a numbering system. I've found such numbering systems are very common in textbooks on any subject (Chapter 1.0, 1.1, 1.2, ...) and technical docs (Section 4.1, 4.2, 4.2.1, ...). Using this system instead, you'd have 1 new tag for every document. If you already have notes that use sectionX[.Y][.Z]
tags, many of them will reused.
</blogpost>
clap bloody good stuff, mate. As always your PRV is excellent! (Passion Results Verbosity! grin)
I often get too verbose (but writing things out helps me clear out my own thoughts)...
Sort methods in the config of CrowdAnki works fine to achieve this default stable setup, just as you say +1 and adding methods to here such as "Filter by the field names 'X'" is impossible, as we do not know what is to be exported. In my mind these options should be presented as default options only, and the user can be given more comprehensive sort options in the Export Config Window grin at that point CrowdAnki knows exactly what cards are to be exported, and thus can calculate which fields are shared on all, and give the options to the user. If there is a more functional sort method setup which can be given a lambda, then I think your dream of a perfect sort option can be achieved +1
Yeah, that'd definitely be extremely useful! It might not work too well with snapshots (particularly automated ones), though, so it'd probably be convenient to be able to also set this non-interactively (i.e. not during the actual export), per-deck. However, setting it in advance would obviously have the disadvantage that CrowdAnki
couldn't guarantee that you wouldn't add a note model that didn't have a given field, in the future. :)
Anyways, I hope my thoughts are helpful in any way. Thank you for the great work and thoughts, as always clap
They are! Thanks, as always for the feedback and the words of encouragement! :)
Very good work and in-depth discussion. I'm partial to a sort by note id since it seems intuitive, though I understand the drawbacks. The only thing that bothers me about a field sort method (whether field1, 2, last, or sort) is if you have a CrowdAnki deck covering a large document such as the Anki manual and you then want to add cards for new users in the middle of the deck. Using a padded-zero approach you may have cards with numbers 0001 through 1000. If you wanted to add new cards in the area of 0200, you'd have to use another system such as 0200a or 0200.1 to avoid having to manually change the existing cards 0201 through 1000. Does anyone know of an advanced regex or an external tool that would be able to handle such a repetitive task?
@ohare93's amazing BrainBrew
could provide a solution. It (among other things) allows converting a CrowdAnki
deck.json
into a CSV (or set of CSVs) and back again.
You could export the deck with CrowdAnki
, convert the deck.json
into a CSV (with BrainBrew
), open the CSV in a spreadsheet, shift all the indices by one (e.g. by moving the relevant cells etc.), convert back to deck.json
and re-import. It's a bit involved but certainly faster than incrementing the indices manually...
Alternatively, making a (re-)indexing Anki add-on should be straightforward. I'll add it to my "to try" list.
What about assuming every tag is a non-sorting tag and allowing users to whitelist (via regex?) strings they wish to use as a sorting tag? For example, if they use the tag_numeric export option, they can also use an option such as sort_tag=chapter:: or sort_tag=section::
Yeah, that'd work! It'd require allowing the method sort types to take an argument (but as discussed above, that'd be useful in other ways as well (general fieldN
sort and a neater approach for the "numerical" sort type)).
This is gave me an idea unrelated to CrowdAnki, so feel free to ignore all of the following text. If you use tags to organize and you pair the document name with the section name (e.g. python3.9_tutorial::section4.1) you will end up with a new tag for every section of every document you make cards from. This seems like linear (linearithmic?) growth in the number of tags and eventually the tag list in the card browser would start to feel crowded. If we make tags atomic using tags such as python3.9_tutorial and section4.1, we retain the ability to find cards from section 4.1 of the python tutorial, with probably far fewer tags. The python3.9_tutorial tag will apply to every note based on the tutorial, while tags of the form sectionX[.Y][.Z] would apply to any document that uses such a numbering system. I've found such numbering systems are very common in textbooks on any subject (Chapter 1.0, 1.1, 1.2, ...) and technical docs (Section 4.1, 4.2, 4.2.1, ...). Using this system instead, you'd have 1 new tag for every document. If you already have notes that use sectionX[.Y][.Z] tags, many of them will reused.
That's really interesting and a good point! It's swayed me considerably away from my previous full-hearted support for tag namespacing, since you're right that it leads to a considerable growth in number of tags (which Anki is regrettably not great at handling, at least not without addons)!
This is a draft based on the discussion in #107 and #108. (Please feel free to totally tear this up.)
I've implemented several new potential sort methods:
note_id
)field_last
)1 < 2 < 10
, andchapter1.1 < chapter1.2 < chapter1.10 < chapter2
). (field1_numeric
,tag_numeric
).sortf
in the respective note model(s)). (browser_sort_field
)For slightly more extensive descriptions and analyses of drawbacks, see below.
They're all independent, though to some extent they could be combined (in particular 3 can be combined with 2 or 4). I'm implementing them in a single PR, because they're all motivated by the same goal of allowing the creators of a deck to control the order in which the learner sees new notes and to avoid an excessively fragmented discussion.
I don't think that they should necessarily all be introduced, so please feel completely free to shoot down any or all of them.
Background
As @sudomain had brought up in #107, it's sometimes crucial for new cards/notes to be seen in a specific order, because the "later" notes build on the "earlier" ones. This is both true when the Anki deck "accompanies" an external learning resource (e.g. a course or a book) and when it's used for standalone learning. In other cases forcing the order is not essential, but might still be valuable.
If the deck configuration uses the "Show new cards in order added" option (Deck
Options
>New cards
>Order
or, alternatively, specified in adeck.json
in the relevant config indeck_configurations
, by theorder
(0
means in random order,1
in order added)), then any new cards will be shown in the order in which they're in thedeck.json
(sinceCrowdAnki
adds the cards in order). Hence, by controlling the export order one can control the order in which learners see new cards.(Note: I haven't checked how this interacts with sub-decks in the beta "Anki 2.1 scheduler" — i.e. whether CrowdAnki imports sub-decks before or after the parent deck. In any case, though, the order within a subdeck will be as exported.)
Shortcomings of current sort methods
(This is not intended to be a criticism of the current methods — AFAICT they were mainly created to ensure a stable export with no spurious diffs and to better inter-operate with tools like
BrainBrew
, which they do admirably.)String comparison
field1
,field2
andtag
compare their contents using a string comparison, such that, say,"2" > "10"
.If we wanted to have an "index"/"sort" field to specify the desired order, then simply using consecutive, unpadded integers wouldn't work, since
"10"
would come before"2"
etc. Padding the integers with leading zeros, to the same length (e.g."002"
,"100"
) is a solution, but it's non-obvious, slightly ugly and very inconvenient (ensuring consistency when the order of magnitude of the number of cards changes would be particularly annoying). One could use an additional tool (potentiallyBrainBrew
and a spreadsheet, or a dedicated Anki addon), but that's suboptimal.The same holds for tags — it'd be nice to be able to just tag cards as, say,
deck_name::sectionX.Y.Z
, where X, Y and Z are integers, without having to pad the integers with the correct number of zeros, guessed in advance (renaming tags in Anki can be a pain...).Position of fields
As @sudomain pointed out, it can be annoying for the index field to take the highly prominent first (or second) position — these should be reserved for the actual content of the note.
Non-sorting tags
For instance, if we have both "sorting" tags (say
deck_name::chapterX.Y
ordeck_name::sectionX.Y
) and "non-sorting" ones (saydeck_name::hard
,deck_name::extra_material
,deck_name::verb
,deck_name::noun
,deck_name::europe
etc.), we don't want the presence or absence of the non-sorting tags to influence the sort order.If the "non-sorting" tags come after the "sorting" tags alphabetically, then they'll be placed later in the
anki_object.tags
concatenated string and hence have negligible influence, but ensuring that that's always the case might be annoying.One can work around this by futher namespacing the sorting and non-sorting tags to enforce correct alphabetical order — having say
deck_name::index::sectionX.Y
anddeck_name::topic::verb
— but it's a bit ugly (and non-obvious, in advance), so it's still not a perfect solution.Ease of use
Having to tag or index all notes can be rather inconvenient.
Analysis of suggested sort methods
Note id
FWIW it seems that Anki's internal note sort order is by id (currently — AFAIK you shouldn't rely on any sort order, if it's not explicitly specified in the SQL query), even if you manually change the id of a note, so
note_id
has (currently) exactly the same effect as thenone
sort method.Advantages:
note
id
, hence changing the order.Disadvantages:
CrowdAnki
obviously does not export thenote_id
, meaning that different people will have differentnote_id
s and most importantly, changes to the order ofnote_id
s won't be shared between contributors, meaning that only one person would be able to change not order...@sudomain has a semi-solution, but it's still non-ideal.
Editing the
note_id
s is impossible without an add-on.Changing
note_id
s feels a bit brittle.Currently, it's equivalent to the "
none
" sort method.However, since modifications to
CrowdAnki
's internals or Anki might change that, it might be worth having a sort order that is explicitly by note id.Last field
The obvious advantage is that a potential "index" field will no longer be in a prominent position. Such an index field can be simply appended to all note types used in the deck.
Sorting by
field_last
is unlikely to have any applications other than for an index field, though."Natural/numeric sort"
As described above it allows sorting fields containing numbers "naturally", such that
1 < 2 < 10
, andchapter1.1 < chapter1.2 < chapter1.10 < chapter2
.The idea is that the string is split into a list of alternating strings containing no numeric characters and non-negative integers. Python happily sorts lists (even lists of variable length, provided that all corresponding fields are comparable), so this allows us to sort in a way where integers are compared as integers, not as strings.
(Note that since we're dealing purely with non-negative integers, not decimals,
"1.5" < "1.10"
etc., which in the context of strings using schemas likesectionX.Y
makes perfect sense, but might be confusing under some other conditions.)It's very similar to the "version" sort in GNU
ls
andsort
(ls -v
andsort -V
), which use the gnulibfilevercmp
function.Obviously, this type of sort can (and if decided to be generally a good idea, probably should) be combined with more of the existing and new methods (
field2
,field_last
,sort_field
).I'm extremely unhappy about the names (
field1_numeric
,tag_numeric
) and would welcome any alternative suggestions.Sorting by the "sort field"
The advantage is that this allows sorting the export, by the field that appears in the Sort Field column of the browser, and hence the export order can be the same as the default sort order in the browser.
It feels neat, but a bit gimmicky, so I'm not sure if it's worth introducing.
I'm also rather unhappy about the name (
browser_sort_field
...).Wishlist
Sorting by only a subset of tags.
As described above there's the issue of "non-sorting tags" potentially interfering with sorting tags. None of the above proposals deal with this. I can't think of any way of solving it that either doesn't involve hardcoding keywords (such as
section
orchapter
), hence losing flexibility, or considerably overhauling the entire system (having per-deck sorting, with configurable keywords).Doubts
Tests
Should the export sort tests be more discriminating? The currently existing tests don't really distinguish between
field1
,field2
andtag
(so for instance if somebody changed one of relevant indices innote_sorter.py
, it wouldn't be caught). The newly added tests further don't distinguish betweenbrowser_sort_field
,field_last
andfield1
etc.However, the tests will catch many forms of breakage, so it might not be worth over-complicating the test lists. Also, the most likely source of breakage is Anki's internals changing, which wouldn't be caught here, anyway...
Flag
The current sort by flag feels rather weird. Notes can have a flag (it's present in the database in the
notes
table), but Anki's source states that they're "not currently exposed", CrowdAnki doesn't export or import them, and I don't think that they even influence the result of a "flag
" sort...(The commonly encountered per-card flags have no bearing on the note flags.)
Naming
As mentioned above, I'd greatly welcome alternative names for most of the proposed sort methods.