Open dae opened 3 years ago
I did work on some SpecialFields PRs, and knowing that you intended to refactor the export/import logic soon, I took some notes. While SpecialFields makes the importing algorithm much more powerful, I don't think the abstractions it provides are necessarily easy to understand. So the following are some ideas of how to better abstract the functionality SpecialFields provides:
convert from "MyNoteTypeV1" to "MyNoteTypeV3"
would have to be accumulated, so you don't have to make the card/field assignment per note.<br>
or possibly <hr>
, and then written to the single new field#
as prefix is the indicator for private tags:medicine::brain medicine::disease #exams::smith
, and imported note has medicine::brain::disease
, then the new note will have medicine::brain::disease #exams::smith
marked
and leech
.
mod
valuemod
value? Special Fields makes sure that it is set on every import however.Of the aforementioned improvements I think "Changing Note Type upon import" generally makes sense to implement.
But for the other changes, they particularly make sense for sharing decks from the "deck maintainer" to the "deck user". However the current Anki export/import behavior would still make sense when the sharing happens between "user" and "user", or if a user wants to back-up his deck/progress.
I think two export/import behaviors could be grouped as follows:
A public / shared export - import:
mod
is olderA private / backup export - import:
mod
is older (You can trust yourself, that the most recent change you made is the one you want)mod
value, probably on deck? I'm not sure that's the case right now.While it might still might make sense to allow for individual turning on-off of those features (?), I think abstracting them that way, would prevent overwhelming the user at first. (EDIT: Talking to Nick, he confirmed, that indeed all individual settings would be necessary)
Happy to answer any questions in regards to this. @hgiesel has a great outline here
Thanks Henrik, that looks really helpful. The scheduler work will likely come first (I've already done some initial work on it), but will come back to this.
Also for reference: https://forums.ankiweb.net/t/importing-rewrite-note-sort-order/13449
Leading/trailing whitespace causing issues: https://forums.ankiweb.net/t/import-export-causes-unwanted-duplicates/14060
Reminder to add checks for invalid card IDs: https://forums.ankiweb.net/t/issue-with-huge-card-ids/12165
Just a brief update on this:
Rumo has implemented .colpkg import/export, and is currently looking into apkg exporting.
The plan is to approach the update in two steps:
1) porting the existing built-in functionality, without trying to tackle all the points mentioned above 2) once we've got a solid base, incrementally adding the new functionality/features mentioned above
While we work on the two steps, the old Python code can be kept around, so that we don't break existing workflows using the SpecialFields add-on.
New apkg and csv import/export implementations are now in main, with functionality roughly equivalent to the old Python code. Now we can turn our attention to the new functionality suggested above.
@hgiesel gave us a great summary. Some thoughts/discussion starters on the various features:
Private fields:
Henrik's suggested approach sounds good here - we could use a checkbox like "[x] Exclude in shared decks" in the Fields dialog.
Private tags:
Private vs public export:
I quite like this idea - we already use the scheduling option to control whether flags should be stripped, and the two use cases are quite different. @AnKingMed, could you explain in more detail which of the lumped settings you want to be able to control independently, and why?
Updates when notetype has changed:
This is probably the hardest part. I think it gets even more complicated if we try allow the user to merge fields, or worry about preserving the content of fields that no longer exist.
Brainstorming here - how does the following sound?
Private fields:
Private tags:
Private vs public export:
Note that the special fields add-on is just for importing. Ideally, exporting would also be considered and users could choose to not export certain fields. We've also recently made an add-on for "export single tag for sharing" so that users can export just a single tag and strip all the other tags out. This is really useful when you tag cards by a lecture, export just that tag, and the person you share with can then import that file with "all fields are special" and "combine tagging" so that it essentially just imports that one tag without affecting any field content or any of the existing tags https://ankiweb.net/shared/info/960563361
Private tags:
I think you suggested to make that list of prefixes configurable, right? I'd argue against that. Tags are something that is used across decks and notetypes. Having configurable tags could make sharing more complicated than it needs to be.
Example:
User A exports a deck with notes that are tagged abc
. User B now imports this deck, but has abc
configured as private. Will the tag be imported or not? And if it is imported, is immediately recognized as private in B's collection, which would make passing it on to user C more complicated?
Instead, what do you think about not making it configurable, but use a prefix or infix of #
? Of course marked
and leech
would also be private for a legacy reason.
my::personal::tag::topic
and my::personal::tag::#::private
. Otherwise if users wanted to categorize their private tags more, they would have to split them far apart, like my::personal::tag
and private::my::personal::tag
. #::
.E.g. this subtree would show the tags medicine::tag
and medicine::#::todo
. So any tags within #::
are inlined into their parent. And the private tag would mark it explicitly for the user.
Any repeated #::
would then be treated normally again. The "magic behavior" would only pertain to the first use of a #
subtag.
Private vs public export:
@AnKingMed Why is a tag-only import necessary, if we allow for public/private tags?
Same for deck descriptions? Why would you not want to update deck descriptions?
Concerning notetype styling, I would also suggest to not cover this use case at this moment. Similar to tags I think an actual good solution would be to be allow for tagging for public and private parts, but we can't really do that right now.
There are times when a user has 100 tags and user B has the same 100 tags (using the same shared deck). User B chooses to alter some of those 100 tags, but also adds a new tag. They want to share that new tag with user A, but aren't ready to share the other changes yet. In this case, there needs to be a way to share just that singular tag without updating the other tags
For deck description, there are many instances in sharing updates with each other to compile things that I've needed to protect the deck description (i.e. I as the deck creator had updated it, but I was importing files from other people for the new update and they had the old description).
For 99% of people, they need just a couple functions, the other functions are for the deck creators and maintainers. Although it's significantly less people using those functions, they're absolutely necessary. I've used every combination of the settings I outlined above in creating and maintaining decks and have a few years of experience with it :)
User A exports a deck with notes that are tagged abc. User B now imports this deck, but has abc configured as private. Will the tag be imported or not? And if it is imported, is immediately recognized as private in B's collection, which would make passing it on to user C more complicated?
It's a good point, and having a hard-coded list of private tags would allow us to simplify the UI. But if deck authors want asymmetric handling of private tags (eg excluding some on import that they'd like included on export), I'm not sure such an approach would work (unless they renamed the tag prior to import/export, which is impractical). I presume that's what you are doing with the MissedQ tag @AnKingMed?
Private vs public export:
Note that the special fields add-on is just for importing.
I think we may have had our wires crossed here. I was asking whether a single "for public/for private" checkbox could work on deck export. All the options you're describing seem to be related to importing, which is a separate matter.
@dae the MissedQ tag is just my personal tag that I use when I miss a practice question and decide a flashcard is relevant to that.
Sorry I didn't understand this was just exporting, however it's probably still worth considering the whole picture when addressing exports and imports.
I think the most ideal way to handle exporting tags is to select exactly which tags you want to export. For example, a dialog that has "select all" or "select none" and then you can choose
I'd prefer to avoid a list of all tags if possible, as it adds a fair bit of extra implementation complexity.
I'm still trying to understand the different use cases that are covered here. Does the following sound correct? Are there other cases I've missed?
1) There are some tags you'll always, or almost always want excluded 2) Sometimes you want to share a single tag with others. By that, I presume you mean a) "only notes with that tag", and b) "preserve any other tags the notes happen to have when importing"?
Could 2b) perhaps be handled when importing, instead of when exporting? Eg if the importer allowed you to only import specific tags? You could accomplish 2a) by searching in the browse screen, then exporting from there.
2b is definitely an importing issue.
There are also instances, however, where I want to export multiple tags. The solution right now would be to export a single tag multiple times and then import each file into the new profile. I think the list of tags is ideal if at all possible if you're going for the best solution
Does the single-tag filtered need to be done on the exporting side though? Couldn't the filtering be done on the importing side, if the importing screen provided a way to include or exclude specific tags?
I think that's much more appropriate on the exporting side. I generally share tags with others and I only want to share certain ones. It would be an extra step to have to share everything and then tell them what to do
For 99% of people, they need just a couple functions, the other functions are for the deck creators and maintainers.
Is the goal really to support all the use cases mentioned here? Call me Bernie Sanders, but I think we should focus on the 99% who'd have a tough time figuring out the correct settings for their use case. Also, I think apkg is the wrong format to solve these problems. As it's simply a collection, it can't represent diffs and has to rely on external information to work as a a half-decent update. This places a burden not only on the exporter (e.g. deck author), but also on the importer to get a lot of settings right. Instead, I propose to implement these things with the new anki-json format. JSON is self-describing, so all the relevant settings could be set on the exporter side, requiring the end user only to click import. It truly can represent a deck update, because missing data is filled in with Anki defaults or existing data. No need to ship a whole new deck just to update a single tag. And lastly, it's plaintext, meaning you can version-control it.
When importing .apkg files, the import dialog provides a dropdown "On schema change", with the options ["Preserve", "Update (full sync)"].
I think you're right that embedding a change-notetype feature is not the right way, @dae. It would be quite powerful, but easy to mess up and would still not solve all the problems.
When we encounter a note that we'd like to update where the notetype has a remapped id, we assign the new notetype id to the note, and update its fields.
This would mean that subsequent updates would continue to cause conflicts. Let's say you have a notetype with id 1 and schema A and the import contains one with also id 1, but schema B. Then a notetype with schema B and id 2 is added to your collection. Then you do another import with the same notetype, but there's still no notetype with matching id and schema, so B3 is added.
What about merging the notetypes? The new notetype would contain the union of fields and card types. The user may decide if card types with the same name, but different templates would be updated or preserved. In case the imported notetype only contains a subset of fields and card types with the same templates (which should be the case on subsequent updates), this would not need to be treated as a conflict and it could be trivially mapped into the existing notetype.
@hgiesel's proposals sound good to me.
But if deck authors want asymmetric handling of private tags (eg excluding some on import that they'd like included on export)
If there's only one toggle for importing/exporting private fields or not, it seems reasonable to me to treat private tags the same.
Is the goal really to support all the use cases mentioned here? Call me Bernie Sanders, but I think we should focus on the 99% who'd have a tough time figuring out the correct settings for their use case.
I disagree with this. The 99% strongly rely on the 1% making high quality decks in these instances and focusing energy on high quality deck making utilities is important.
You both have valid points. Deck authors do bring a lot of value to the ecosystem, so we can't ignore their needs. At the same time, there is a higher bar for entry into the core code than in add-ons, and our goal here is to identify the important features and try to add them in a way that is maintainable, and fits in with the rest of the UI (the current special fields add-on UI is not particularly understandable).
@AnKingMed could you give us some "user stories" about why you're exporting individual tags, so we better understand your workflow? What sort of information do these individual tags represent, and why do you need to share them with others? Would an "only add, don't replace tags" option on import work in some/all of such cases?
@RumovZ re using JSON for this: that is a good point, and JSON may well be more appropriate here. With colpkg/apkg/csv importing mostly done at this point, and fallback to the old importers with the new setting enabled, I feel like we're in a good place now. So long as we continue to support the legacy codepaths, there's no need to rush to add these features to our apkg importer, and perhaps it's worth pursuing the JSON importer further first.
While features focusing on shuffling data between collaborating deck authors may make more sense for the JSON exporter, some of the other proposals like private fields and tags probably still make sense for the apkg exporter, as they would be useful in the more common case of deck author distributing changes to end users. But presumably in that use case, we don't need to worry about things like single-tag imports.
This would mean that subsequent updates would continue to cause conflicts. Let's say you have a notetype with id 1 and schema A and the import contains one with also id 1, but schema B. Then a notetype with schema B and id 2 is added to your collection. Then you do another import with the same notetype, but there's still no notetype with matching id and schema, so B3 is added.
Yep, good point. The way the original import code addressed this case is to add 1 to the old notetype id when adding a new one, and keep doing so until it found a free id or found a notetype with a matching schema.
What about merging the notetypes? The new notetype would contain the union of fields and card types.
That feels more complex than the approach above - WDYT?
Re: private tags, a few thoughts:
I'm not saying we should ignore deck creators, just address their needs in some way that doesn't aggravate the experience of the average user. E.g. by putting highly specialised features in an add-on or a part of Anki, that's only for advanced usage anyway (JSON export).
That feels more complex than the approach above - WDYT?
Sure, but it's also much more useful. Say you add a private field to a notetype from a shared deck. Then you receive an update for its notes. With Preserve you'd miss out on the updates, with Update you'd lose your private field data. Merge would offer the best of the two worlds. The user would only need to tell Anki whether to update the card templates.
A separate node would better fit the hereditary nature of marking tags as private, but it would introduce the concept of meta tags. No idea what would be less complex to implement in the end. I'd prefer adding a label to each private tag, too, but to avoid clutter, maybe an alternative icon for private tags would be more appropriate. As for sorting, my understanding was that @hgiesel wanted to strip the prefix/parent beforehand.
And with a prefix match, either '#foo' or '#::foo' would be valid, so users could nest them if they wanted to.
Hmm, then #
as a tag would get special-cased? That would seem to add the complexity of a separate node as well.
As I understand, the idea is to map prefixed tags to flagged tags in the GUI (#foo
→ foo (private)
). This might get more cumbersome if there were two different ways to map foo (private)
: #foo
or #::foo
.
Now that I think about it I see some more issues with both a prefix and a parent tag:
#foo
and foo
in your collection. What happens if you import a deck with foo
? Is this tag private or not?
Disallowing to have both tags doesn't seem realistically enforcable. And of course we have to consider collections already using the prefix character.#foo
and foo
would be treated as distinct tags. So if you open your collection with a client not supporting private tags (like AnkiDroid for some time presumably), you'd need to deal with the prefix manually. But how would users even know about that?Maybe adding a dedicated flag to the tag config would be better after all?
@AnKingMed could you give us some "user stories" about why you're exporting individual tags, so we better understand your workflow? What sort of information do these individual tags represent, and why do you need to share them with others? Would an "only add, don't replace tags" option on import work in some/all of such cases?
With these big premade decks, often one student will go through a lecture and tag all relevant notes. They can then share that individual tag and the other user can import that individual tag. User #2 may, however, have changed some of the premade deck tags and importing all of User #1's tags into their database would either make it really messy or overwrite all of that
Sure, but it's also much more useful. Say you add a private field to a notetype from a shared deck. Then you receive an update for its notes. With Preserve you'd miss out on the updates, with Update you'd lose your private field data. Merge would offer the best of the two worlds. The user would only need to tell Anki whether to update the card templates.
Good point about the private field case. Would this work in other cases like renames though? For example, a deck author publishes a deck with "field 1" and "field 2", which the user imports. Then they rename the fields to "field a" and "field b", and reshare the deck with a modified template. If we were to take a union of the notetypes, you'd end up with two copies of all the field content, and two templates, one showing field 1/2 and one showing field a/b. I suspect this is the most common workflow, where users consume shared decks and want to fetch new versions from the deck author, without maintaining any local changes.
You can basically have the same tag twice in your collection. Say you have both #foo and foo in your collection. What happens if you import a deck with foo? Is this tag private or not?
My thinking is that #foo and foo are two distinct tags, just as foo and Foo are two distinct symbols/variables in most programming languages. The Go language uses case to denote public/private methods, so a class can have a private "foo()" and a public "Foo()" at the same time if it wants. All our code would have to do is check if a tag started with a # (or optionally, its ancestor) to decide if it's private. The browser could optionally add a (private) label to such tags to make the meaning of the symbol more clear, but I do not think it should hide the # symbol, as that would be confusing if there also happened to be an unprefixed tag of the same name.
Toggling privateness of a tag would need to modify all notes having that tag.
True, but I'm not sure that's a big deal for tags that the user would permanently want private. It also might play better with mtime-based update checks - consider the case where a note with mtime 123 and tag foo is exported, and imported into a user's collection. If it had been marked as private separately, and the user toggled it to public and re-exported, the importing side would not see any mtime change.
For ephemeral exclusions it would be awkward, but perhaps we could handle such cases a different way?
Would this work in other cases like renames though?
No, it's meant as an alternative beside Preserve and Update, or possibly as a replacement for Preserve.
but I do not think it should hide the # symbol, as that would be confusing if there also happened to be an unprefixed tag of the same name.
Yes, the suggestion to have the GUI strip the prefix seemed reasonable to me at first, but I've realised the problems you describe. However, wasn't the idea to not only skip private tags when exporting, but also when importing? This at least seems impractical with the prefix approach for the reason I've mentioned above.
If it had been marked as private separately, and the user toggled it to public and re-exported, the importing side would not see any mtime change.
I gather the problem is that the tags table lacks an mtime column? Sorry for diverging, but as tags have continued to become more stand-alone and powerful, have you ever considered giving them ids (and mtimes)? For one thing, that would make renaming and reordering them a much smoother experience.
wasn't the idea to not only skip private tags when exporting, but also when importing? This at least seems impractical with the prefix approach for the reason I've mentioned above.
Sorry, I couldn't locate this - could you quote or restate the issue here? Wouldn't it be as simple as the following?
existing_note.tags = remove_all_except_private(existing_note.tags)
for tag in incoming_note.tags:
if not has_private_char(tag):
existing_note.tags.add(tag)
I gather the problem is that the tags table lacks an mtime column?
maybe_update_note() compares note modification times. If privateness were a property stored in the tags table, toggling it would not update the mtime on notes that use the tag, and thus the change in available tags on import would not be known to the importer, as the mtime would still match.
Sorry for diverging, but as tags have continued to become more stand-alone and powerful, have you ever considered giving them ids (and mtimes)?
Very old Anki versions stored the tags in a separate table, instead of embedding them in a note. I seem to recall it not performing well at the time, as the tags table could grow into the millions of entries. That may have been poor table design though; I do not recall the details off the top of my head.
Recently I've been working a bit on the AnkiDroid code to see if I can give them a bit of a push towards moving to the new backend. Until they've fully bought into it and can update to new versions promptly, we're somewhat limited in the breaking changes we can make, as I don't want clients to have to go through expensive upgrade/downgrade steps just to talk to each other.
Sorry, I couldn't locate this - could you quote or restate the issue here?
My bad! I was referring to the first answer here, but upon reading it again, I realise it doesn't say what I thought it did. I'm still not sure I understand the problem with making privateness a flag. Do you mean the following scenario?
foo
and mtime 123.foo
private.Wouldn't we just update the tags separately from the notes, like we currently do with most other objects?
Until they've fully bought into it and can update to new versions promptly, we're somewhat limited in the breaking changes we can make, as I don't want clients to have to go through expensive upgrade/downgrade steps just to talk to each other.
I understand, but do you think this may happen in the long term? I wonder if I should think of tags more like full Anki objects or rather plain strings with some additional related metadata.
I understand, but do you think this may happen in the long term?
It will require rewriting the majority of our tag handling code, so the gains we get from doing so would need to be fairly large to justify the effort. Operations that modify the text of tags are the hierarchy would become faster, but accessing the tags of a note would require looking them up in the DB and potentially caching the results. The syncing protocol would need adjusting, and importing would need to deal with the tags as strings, as the tag ids in the source and target decks may not match.
I wonder whether it's worth the effort?
Thanks for explaining, I finally understand. This would indeed make updating a bit more cumbersome. Don't we have the same problem with private fields? There the mtime bump happens in the notetypes table.
Currently, an exported note is an exact clone of its source note, so it rightfully has the same mtime. With privateness, we create a divergent ad-hoc note, so the mtime may or may not make sense. Maybe we need to invariably bump the mtime on partially exported notes?
You're right, it would be a problem there too.
Bumping the mtime each time could cause newer changes to be overwritten with older ones. One alternative I can think of is to check for source.mtime >= target.mtime instead of source.mtime > target.mtime, ie updating in the matching mtime case. Ideally we'd only add them to the list of updated notes if the tag or field content had actually changed, so we'd need to compare the two in the identical case before applying/counting the changes.
Right. I think we should check for identity in the source.mtime > target.mtime
case as well, because the newer mtime may result from changes to private fields alone. In which case we also might overwrite newer changes with older ones, by the way.
I think this once more shows the inadequacy of using apkgs for updates. Would be great if deck authors could generate a JSON file only containing the diff between their originally shipped deck and current collection.
I suspect JSON imports will improve on some issues and introduce others, and diff-based updates rely on the target collection having the same or similar base point as the collection that was used to generate the diff.
In any case, it sounds like you're more enthusiastic about the JSON side of things at the moment, and we don't currently have a fully-fleshed-out implementation to compare to the existing functionality of that add-on. So maybe it would be worth investing some more time on that side of things first, and then circling back to the apkg side of things later?
Sounds good!
and diff-based updates rely on the target collection having the same or similar base point as the collection that was used to generate the diff.
I imagine deck authors would share an apkg initially and then offer JSON-based updates. These would be much smaller in size and less likely to conflict with any changes endusers have made. If you tried to import an update without the base deck, probably nothing would happen, as all the referenced ids would resolve to nothing. For this workflow to work, we'd need a way to include media, probably reusing the archive tooling for apkgs. This reminds me, we should also add some meta info with a version number we can bump when making breaking changes.
Does this sound like a direction in which you'd like to go? (Of course, I'd try to find some low-hanging fruit first and leave media and so on for a later iteration.)
Agreed on dealing with the media later; one possibility would be to allow a .ankijson file inside an .apkg container instead of an .anki2 (we'd need to decide whether that would be better than a separate extension or not)
Deck authors tend to both add new cards and modify existing ones, as well as sometimes modifying the notetype structure or changing notes to a different notetype, so ideally we will handle all of those cases
Does this sound like a direction in which you'd like to go?
Sounding good. Apologies I haven't been as involved recently; I've been diving into the AnkiDroid code lately, and have also had other non-Anki related things going on.
Repurposing apkgs is an interesting idea. I myself was taking a bit of a break and didn't perceive any lack of involvement. And I'm very excited to see AnkiDroid's backend integration coming along!
I'd like to refactor the import/export code soon, moving some or most of it to Rust. This will probably come after the scheduling rework.
When working on this: