refactor import/export code

dae commented 3 years ago

I'd like to refactor the import/export code soon, moving some or most of it to Rust. This will probably come after the scheduling rework.

When working on this:

[ ] we'll need to bear in mind use cases that the Special Fields add-on currently enables. https://github.com/ankitects/anki/pull/507
[ ] we'll need to provide some upgrade path for third-party imports like Mnemosyne and SuperMemo

hgiesel commented 3 years ago

I did work on some SpecialFields PRs, and knowing that you intended to refactor the export/import logic soon, I took some notes. While SpecialFields makes the importing algorithm much more powerful, I don't think the abstractions it provides are necessarily easy to understand. So the following are some ideas of how to better abstract the functionality SpecialFields provides:

Changing Note Type upon import

Currently, if the note type of a note changed, the note cannot be imported anymore. It would be nice, if the user could convert the note to the new note type similar to how he does it in the browser.
- In fact, SpecialFields uses that exact method.
- For that, first the note types of the imported deck would need to be imported, then all notes which follow the same note change pattern, like convert from "MyNoteTypeV1" to "MyNoteTypeV3" would have to be accumulated, so you don't have to make the card/field assignment per note.
- For making this easier on the user, there could be also some improvements to the "Change Note Type" dialog:
- Make the default assignment in that dialog based on equality of the field or card type names. I think it's currently based solely on the position of the field / card type.
- Make it possible to assign multiple old fields to the same new field: The old fields could first be joined with <br> or possibly <hr>, and then written to the single new field
- Disable changing the card types/fields by scrolling on them. When having a few more fields, you want to scroll down the overview and then accidentally change all the assignments.

Private Fields:

There could be some way to mark a field of a note type as private, probably in the Fields dialog.
When exporting a private field -->:
- field is left empty
When importing a private field <--:
- field content of imported note is discarded in favor of the content that the field has in the local collection

Private Tags:

There could be some way to mark tags as private. This could be done via a prefix, or suffix. Or maybe an icon in the Browser sidebar, that can be activated. Or an item in the context menu of the sidebar, where after activating the tag has some visual indication to mark it private.
Marking a hierarchical tag private, could also mark all of its descendants private (?) It's hard to say what would be the best behavior here.
When exporting a private tag -->:
- Tag is not exported
When importing tags <--:
- Local private tags are merged with new incoming tags
- Let's say a # as prefix is the indicator for private tags:
- local note has medicine::brain medicine::disease #exams::smith, and imported note has medicine::brain::disease, then the new note will have medicine::brain::disease #exams::smith
Prime example of what would constitute default private tags are marked and leech.
- Also flags (#1017) and buried/suspended could be included in this kind of "private note state"

Update even if note type or note has older `mod` value

The deck maintainer might have made changes before the deck user has made some private changes (possibly in private fields/tags?), which currently will prevent the note from being imported.
The same is true for note types. A deck maintainer might have made some styling changes, which will be prevented from being imported, because a user might have made some private styling changes.
- According to Nick, it's fine if the private changes are overwritten, as he will communicate to his users, that they should back that up before importing the update. This could be alleviated by implementing some "Private styling marker", but that might be a thought for another time.
If you import "PublicDeckV3", and you only have "PublicDeckV2" imported, there is no doubt that all notes and note types should be updated.

Update deck description

Not quite sure how it works currently? Maybe the deck description is only set on the first import, or does it also has its own mod value? Special Fields makes sure that it is set on every import however.

Private vs Public export

Of the aforementioned improvements I think "Changing Note Type upon import" generally makes sense to implement.

But for the other changes, they particularly make sense for sharing decks from the "deck maintainer" to the "deck user". However the current Anki export/import behavior would still make sense when the sharing happens between "user" and "user", or if a user wants to back-up his deck/progress.

I think two export/import behaviors could be grouped as follows:

A public / shared export - import:
- Don't export scheduling
- Don't export private fields like described above
- Don't export private tags like described above
- Force update even if the mod is older
- Update deck description (even if older) (?)
A private / backup export - import:
- Export scheduling
- Export all fields, including private
- Export all tags, including private
- Don't import if mod is older (You can trust yourself, that the most recent change you made is the one you want)
- Update deck description only if newer (?) For that it would also need a mod value, probably on deck? I'm not sure that's the case right now.

While it might still might make sense to allow for individual turning on-off of those features (?), I think abstracting them that way, would prevent overwhelming the user at first. (EDIT: Talking to Nick, he confirmed, that indeed all individual settings would be necessary)

AnKingMed commented 3 years ago

Happy to answer any questions in regards to this. @hgiesel has a great outline here

dae commented 3 years ago

Thanks Henrik, that looks really helpful. The scheduler work will likely come first (I've already done some initial work on it), but will come back to this.

dae commented 3 years ago

For reference: https://forums.ankiweb.net/t/import-problems-suggestions/5856

dae commented 3 years ago

Also for reference: https://forums.ankiweb.net/t/importing-rewrite-note-sort-order/13449

dae commented 3 years ago

Leading/trailing whitespace causing issues: https://forums.ankiweb.net/t/import-export-causes-unwanted-duplicates/14060

dae commented 2 years ago

Another one: https://forums.ankiweb.net/t/stats-ruined-when-importing-decks-from-others/15901/4

dae commented 2 years ago

Specifying deck: https://forums.ankiweb.net/t/deck-field-option-for-file-import-csv/18275/3

abdnh commented 2 years ago

Reminder to add checks for invalid card IDs: https://forums.ankiweb.net/t/issue-with-huge-card-ids/12165

dae commented 2 years ago

Just a brief update on this:

Rumo has implemented .colpkg import/export, and is currently looking into apkg exporting.

The plan is to approach the update in two steps:

1) porting the existing built-in functionality, without trying to tackle all the points mentioned above 2) once we've got a solid base, incrementally adding the new functionality/features mentioned above

While we work on the two steps, the old Python code can be kept around, so that we don't break existing workflows using the SpecialFields add-on.

dae commented 2 years ago

New apkg and csv import/export implementations are now in main, with functionality roughly equivalent to the old Python code. Now we can turn our attention to the new functionality suggested above.

@hgiesel gave us a great summary. Some thoughts/discussion starters on the various features:

Private fields:

Henrik's suggested approach sounds good here - we could use a checkbox like "[x] Exclude in shared decks" in the Fields dialog.

Private tags:

Maybe a list of prefixes is simplest?
We could perhaps use ['private::', 'marked', 'leech'] as the default?
If we didn't require an exact match, it would allow users to specify prefixes like '#' to match tags like '#foo', and not require them to have to type '#::foo'. It would be a bit simpler to implement.
A non-exact prefix match couldn't be configured via the sidebar though - we'd presumably need to show the list in both the importing and exporting screens, remembering the set value.

Private vs public export:

I quite like this idea - we already use the scheduling option to control whether flags should be stripped, and the two use cases are quite different. @AnKingMed, could you explain in more detail which of the lumped settings you want to be able to control independently, and why?

Updates when notetype has changed:

This is probably the hardest part. I think it gets even more complicated if we try allow the user to merge fields, or worry about preserving the content of fields that no longer exist.

Brainstorming here - how does the following sound?

When importing .apkg files, the import dialog provides a dropdown "On schema change", with the options ["Preserve", "Update (full sync)"]. The former ignores updated notes like we do currently; the latter automatically changes the notetype of existing notes as part of the import.
When we encounter a note that we'd like to update where the notetype has a remapped id, we assign the new notetype id to the note, and update its fields. We also keep track of notes updated this way.
Normally when changing notetypes, there is a) field rewriting and b) card ordinal changes. We should be able to skip a) in this case as we overwrote the fields, but still need to handle b) before we call import_cards(). Perhaps we could repurpose the existing notetypechange.rs code by feeding it a field map with no changes, so only the ordinals get adjusted?
At the end of the import, we might want to warn the user that a one-way sync is now required.

AnKingMed commented 2 years ago

Private fields:

I would recommend explaining that its excluded in export, but also protected on import or something to that effect

Private tags:

private:: makes sense.
In the special fields add-on we just have a field users can edit text and any tags containing the text will be protected. i.e. "foo" would protect tags "foo", "foo::bar" and "foobar".

Private vs public export:

The special fields add-on allows users to choose which of the following they want when importing and I would recommend having all of these features individually configurable as I've used every possible configuration of this on multiple occasions
- which fields are "protected" on import
- an option to protect all fields on import (for when you want to import just tags)
- an option to combine tagging (or when toggled off, update tagging and strip old tags)
- an option to update the deck description or not
- an option to update the note styling or not (note that when this is toggled on it will change note types to the current note type, when toggled off it will not update if the note types are different)
- The "update only if newer" option isn't actually needed. Basically that just forces Anki to update cards regardless of the modification time stamp on the cards.
- Tags that are protected if the "combine tagging" is NOT toggled on (this will update all tags, but leave the few that are in this box)

Note that the special fields add-on is just for importing. Ideally, exporting would also be considered and users could choose to not export certain fields. We've also recently made an add-on for "export single tag for sharing" so that users can export just a single tag and strip all the other tags out. This is really useful when you tag cards by a lecture, export just that tag, and the person you share with can then import that file with "all fields are special" and "combine tagging" so that it essentially just imports that one tag without affecting any field content or any of the existing tags https://ankiweb.net/shared/info/960563361

hgiesel commented 2 years ago

Private tags:

I think you suggested to make that list of prefixes configurable, right? I'd argue against that. Tags are something that is used across decks and notetypes. Having configurable tags could make sharing more complicated than it needs to be.

Example: User A exports a deck with notes that are tagged abc. User B now imports this deck, but has abc configured as private. Will the tag be imported or not? And if it is imported, is immediately recognized as private in B's collection, which would make passing it on to user C more complicated?

Instead, what do you think about not making it configurable, but use a prefix or infix of #? Of course marked and leech would also be private for a legacy reason.

this would allow for public and private tags to coexist more easily, e.g. my::personal::tag::topic and my::personal::tag::#::private. Otherwise if users wanted to categorize their private tags more, they would have to split them far apart, like my::personal::tag and private::my::personal::tag.
This works nicely with the hierarchical view in the sidebar. The user could right-click a tag at any hierarchy point, and have an option "Set private", which would internally just prefix them with #::.

E.g. this subtree would show the tags medicine::tag and medicine::#::todo. So any tags within #:: are inlined into their parent. And the private tag would mark it explicitly for the user.

Any repeated #:: would then be treated normally again. The "magic behavior" would only pertain to the first use of a # subtag.

Private vs public export:

@AnKingMed Why is a tag-only import necessary, if we allow for public/private tags?
Same for deck descriptions? Why would you not want to update deck descriptions?
Concerning notetype styling, I would also suggest to not cover this use case at this moment. Similar to tags I think an actual good solution would be to be allow for tagging for public and private parts, but we can't really do that right now.

AnKingMed commented 2 years ago

There are times when a user has 100 tags and user B has the same 100 tags (using the same shared deck). User B chooses to alter some of those 100 tags, but also adds a new tag. They want to share that new tag with user A, but aren't ready to share the other changes yet. In this case, there needs to be a way to share just that singular tag without updating the other tags

For deck description, there are many instances in sharing updates with each other to compile things that I've needed to protect the deck description (i.e. I as the deck creator had updated it, but I was importing files from other people for the new update and they had the old description).

For 99% of people, they need just a couple functions, the other functions are for the deck creators and maintainers. Although it's significantly less people using those functions, they're absolutely necessary. I've used every combination of the settings I outlined above in creating and maintaining decks and have a few years of experience with it :)

dae commented 2 years ago

User A exports a deck with notes that are tagged abc. User B now imports this deck, but has abc configured as private. Will the tag be imported or not? And if it is imported, is immediately recognized as private in B's collection, which would make passing it on to user C more complicated?

It's a good point, and having a hard-coded list of private tags would allow us to simplify the UI. But if deck authors want asymmetric handling of private tags (eg excluding some on import that they'd like included on export), I'm not sure such an approach would work (unless they renamed the tag prior to import/export, which is impractical). I presume that's what you are doing with the MissedQ tag @AnKingMed?

Private vs public export:

Note that the special fields add-on is just for importing.

I think we may have had our wires crossed here. I was asking whether a single "for public/for private" checkbox could work on deck export. All the options you're describing seem to be related to importing, which is a separate matter.

AnKingMed commented 2 years ago

@dae the MissedQ tag is just my personal tag that I use when I miss a practice question and decide a flashcard is relevant to that.

Sorry I didn't understand this was just exporting, however it's probably still worth considering the whole picture when addressing exports and imports.

I think the most ideal way to handle exporting tags is to select exactly which tags you want to export. For example, a dialog that has "select all" or "select none" and then you can choose

[x] tag 1
[ ] tag 2
[x] tag 3
- [ ] subtag 1
- [x] subtag 2

dae commented 2 years ago

I'd prefer to avoid a list of all tags if possible, as it adds a fair bit of extra implementation complexity.

I'm still trying to understand the different use cases that are covered here. Does the following sound correct? Are there other cases I've missed?

1) There are some tags you'll always, or almost always want excluded 2) Sometimes you want to share a single tag with others. By that, I presume you mean a) "only notes with that tag", and b) "preserve any other tags the notes happen to have when importing"?

Could 2b) perhaps be handled when importing, instead of when exporting? Eg if the importer allowed you to only import specific tags? You could accomplish 2a) by searching in the browse screen, then exporting from there.

AnKingMed commented 2 years ago

true
I mean a) only notes with that tag, but I only want to export the notes with that tag and I want all other tags to be stripped (note that if I choose a head tag, I want it to include all subtags and notes with subtags as well)

2b is definitely an importing issue.

There are also instances, however, where I want to export multiple tags. The solution right now would be to export a single tag multiple times and then import each file into the new profile. I think the list of tags is ideal if at all possible if you're going for the best solution

dae commented 2 years ago

Does the single-tag filtered need to be done on the exporting side though? Couldn't the filtering be done on the importing side, if the importing screen provided a way to include or exclude specific tags?

AnKingMed commented 2 years ago

I think that's much more appropriate on the exporting side. I generally share tags with others and I only want to share certain ones. It would be an extra step to have to share everything and then tell them what to do

RumovZ commented 2 years ago

For 99% of people, they need just a couple functions, the other functions are for the deck creators and maintainers.

Is the goal really to support all the use cases mentioned here? Call me Bernie Sanders, but I think we should focus on the 99% who'd have a tough time figuring out the correct settings for their use case. Also, I think apkg is the wrong format to solve these problems. As it's simply a collection, it can't represent diffs and has to rely on external information to work as a a half-decent update. This places a burden not only on the exporter (e.g. deck author), but also on the importer to get a lot of settings right. Instead, I propose to implement these things with the new anki-json format. JSON is self-describing, so all the relevant settings could be set on the exporter side, requiring the end user only to click import. It truly can represent a deck update, because missing data is filled in with Anki defaults or existing data. No need to ship a whole new deck just to update a single tag. And lastly, it's plaintext, meaning you can version-control it.

Changing Notetypes

When importing .apkg files, the import dialog provides a dropdown "On schema change", with the options ["Preserve", "Update (full sync)"].

I think you're right that embedding a change-notetype feature is not the right way, @dae. It would be quite powerful, but easy to mess up and would still not solve all the problems.

When we encounter a note that we'd like to update where the notetype has a remapped id, we assign the new notetype id to the note, and update its fields.

This would mean that subsequent updates would continue to cause conflicts. Let's say you have a notetype with id 1 and schema A and the import contains one with also id 1, but schema B. Then a notetype with schema B and id 2 is added to your collection. Then you do another import with the same notetype, but there's still no notetype with matching id and schema, so B3 is added.

What about merging the notetypes? The new notetype would contain the union of fields and card types. The user may decide if card types with the same name, but different templates would be updated or preserved. In case the imported notetype only contains a subset of fields and card types with the same templates (which should be the case on subsequent updates), this would not need to be treated as a conflict and it could be trivially mapped into the existing notetype.

Private vs. Public

@hgiesel's proposals sound good to me.

But if deck authors want asymmetric handling of private tags (eg excluding some on import that they'd like included on export)

If there's only one toggle for importing/exporting private fields or not, it seems reasonable to me to treat private tags the same.

AnKingMed commented 2 years ago

Is the goal really to support all the use cases mentioned here? Call me Bernie Sanders, but I think we should focus on the 99% who'd have a tough time figuring out the correct settings for their use case.

I disagree with this. The 99% strongly rely on the 1% making high quality decks in these instances and focusing energy on high quality deck making utilities is important.

dae commented 2 years ago

You both have valid points. Deck authors do bring a lot of value to the ecosystem, so we can't ignore their needs. At the same time, there is a higher bar for entry into the core code than in add-ons, and our goal here is to identify the important features and try to add them in a way that is maintainable, and fits in with the rest of the UI (the current special fields add-on UI is not particularly understandable).

@AnKingMed could you give us some "user stories" about why you're exporting individual tags, so we better understand your workflow? What sort of information do these individual tags represent, and why do you need to share them with others? Would an "only add, don't replace tags" option on import work in some/all of such cases?

@RumovZ re using JSON for this: that is a good point, and JSON may well be more appropriate here. With colpkg/apkg/csv importing mostly done at this point, and fallback to the old importers with the new setting enabled, I feel like we're in a good place now. So long as we continue to support the legacy codepaths, there's no need to rush to add these features to our apkg importer, and perhaps it's worth pursuing the JSON importer further first.

While features focusing on shuffling data between collaborating deck authors may make more sense for the JSON exporter, some of the other proposals like private fields and tags probably still make sense for the apkg exporter, as they would be useful in the more common case of deck author distributing changes to end users. But presumably in that use case, we don't need to worry about things like single-tag imports.

This would mean that subsequent updates would continue to cause conflicts. Let's say you have a notetype with id 1 and schema A and the import contains one with also id 1, but schema B. Then a notetype with schema B and id 2 is added to your collection. Then you do another import with the same notetype, but there's still no notetype with matching id and schema, so B3 is added.

Yep, good point. The way the original import code addressed this case is to add 1 to the old notetype id when adding a new one, and keep doing so until it found a free id or found a notetype with a matching schema.

What about merging the notetypes? The new notetype would contain the union of fields and card types.

That feels more complex than the approach above - WDYT?

Re: private tags, a few thoughts:

If we push the single tag/asymmetric handling to JSON, then that seems to make the argument for a predefined character in apkg handling even stronger
How about a prefix instead of a separate parent node? Eg "#foo" instead of "#::foo". If we want to show a private label, it could be on each tag instead of just the parent, which might be clearer? Presumably users won't have that many private tags? And with a prefix match, either '#foo' or '#::foo' would be valid, so users could nest them if they wanted to.
When choosing a symbol, we might want to consider whether the symbol sorts above or below ASCII text.

RumovZ commented 2 years ago

I'm not saying we should ignore deck creators, just address their needs in some way that doesn't aggravate the experience of the average user. E.g. by putting highly specialised features in an add-on or a part of Anki, that's only for advanced usage anyway (JSON export).

Changing Notetypes

That feels more complex than the approach above - WDYT?

Sure, but it's also much more useful. Say you add a private field to a notetype from a shared deck. Then you receive an update for its notes. With Preserve you'd miss out on the updates, with Update you'd lose your private field data. Merge would offer the best of the two worlds. The user would only need to tell Anki whether to update the card templates.

Private Tags

A separate node would better fit the hereditary nature of marking tags as private, but it would introduce the concept of meta tags. No idea what would be less complex to implement in the end. I'd prefer adding a label to each private tag, too, but to avoid clutter, maybe an alternative icon for private tags would be more appropriate. As for sorting, my understanding was that @hgiesel wanted to strip the prefix/parent beforehand.

And with a prefix match, either '#foo' or '#::foo' would be valid, so users could nest them if they wanted to.

Hmm, then # as a tag would get special-cased? That would seem to add the complexity of a separate node as well. As I understand, the idea is to map prefixed tags to flagged tags in the GUI (#foo → foo (private)). This might get more cumbersome if there were two different ways to map foo (private): #foo or #::foo.

Now that I think about it I see some more issues with both a prefix and a parent tag:

You can basically have the same tag twice in your collection. Say you have both #foo and foo in your collection. What happens if you import a deck with foo? Is this tag private or not? Disallowing to have both tags doesn't seem realistically enforcable. And of course we have to consider collections already using the prefix character.
It's not backward compatible, meaning #foo and foo would be treated as distinct tags. So if you open your collection with a client not supporting private tags (like AnkiDroid for some time presumably), you'd need to deal with the prefix manually. But how would users even know about that?
Toggling privateness of a tag would need to modify all notes having that tag.

Maybe adding a dedicated flag to the tag config would be better after all?

AnKingMed commented 2 years ago

@AnKingMed could you give us some "user stories" about why you're exporting individual tags, so we better understand your workflow? What sort of information do these individual tags represent, and why do you need to share them with others? Would an "only add, don't replace tags" option on import work in some/all of such cases?

With these big premade decks, often one student will go through a lecture and tag all relevant notes. They can then share that individual tag and the other user can import that individual tag. User #2 may, however, have changed some of the premade deck tags and importing all of User #1's tags into their database would either make it really messy or overwrite all of that

dae commented 2 years ago

Sure, but it's also much more useful. Say you add a private field to a notetype from a shared deck. Then you receive an update for its notes. With Preserve you'd miss out on the updates, with Update you'd lose your private field data. Merge would offer the best of the two worlds. The user would only need to tell Anki whether to update the card templates.

Good point about the private field case. Would this work in other cases like renames though? For example, a deck author publishes a deck with "field 1" and "field 2", which the user imports. Then they rename the fields to "field a" and "field b", and reshare the deck with a modified template. If we were to take a union of the notetypes, you'd end up with two copies of all the field content, and two templates, one showing field 1/2 and one showing field a/b. I suspect this is the most common workflow, where users consume shared decks and want to fetch new versions from the deck author, without maintaining any local changes.

You can basically have the same tag twice in your collection. Say you have both #foo and foo in your collection. What happens if you import a deck with foo? Is this tag private or not?

My thinking is that #foo and foo are two distinct tags, just as foo and Foo are two distinct symbols/variables in most programming languages. The Go language uses case to denote public/private methods, so a class can have a private "foo()" and a public "Foo()" at the same time if it wants. All our code would have to do is check if a tag started with a # (or optionally, its ancestor) to decide if it's private. The browser could optionally add a (private) label to such tags to make the meaning of the symbol more clear, but I do not think it should hide the # symbol, as that would be confusing if there also happened to be an unprefixed tag of the same name.

Toggling privateness of a tag would need to modify all notes having that tag.

True, but I'm not sure that's a big deal for tags that the user would permanently want private. It also might play better with mtime-based update checks - consider the case where a note with mtime 123 and tag foo is exported, and imported into a user's collection. If it had been marked as private separately, and the user toggled it to public and re-exported, the importing side would not see any mtime change.

For ephemeral exclusions it would be awkward, but perhaps we could handle such cases a different way?

RumovZ commented 2 years ago

Would this work in other cases like renames though?

No, it's meant as an alternative beside Preserve and Update, or possibly as a replacement for Preserve.

but I do not think it should hide the # symbol, as that would be confusing if there also happened to be an unprefixed tag of the same name.

Yes, the suggestion to have the GUI strip the prefix seemed reasonable to me at first, but I've realised the problems you describe. However, wasn't the idea to not only skip private tags when exporting, but also when importing? This at least seems impractical with the prefix approach for the reason I've mentioned above.

If it had been marked as private separately, and the user toggled it to public and re-exported, the importing side would not see any mtime change.

I gather the problem is that the tags table lacks an mtime column? Sorry for diverging, but as tags have continued to become more stand-alone and powerful, have you ever considered giving them ids (and mtimes)? For one thing, that would make renaming and reordering them a much smoother experience.

dae commented 2 years ago

wasn't the idea to not only skip private tags when exporting, but also when importing? This at least seems impractical with the prefix approach for the reason I've mentioned above.

Sorry, I couldn't locate this - could you quote or restate the issue here? Wouldn't it be as simple as the following?

existing_note.tags = remove_all_except_private(existing_note.tags)
for tag in incoming_note.tags:
  if not has_private_char(tag):
     existing_note.tags.add(tag)

I gather the problem is that the tags table lacks an mtime column?

maybe_update_note() compares note modification times. If privateness were a property stored in the tags table, toggling it would not update the mtime on notes that use the tag, and thus the change in available tags on import would not be known to the importer, as the mtime would still match.

Sorry for diverging, but as tags have continued to become more stand-alone and powerful, have you ever considered giving them ids (and mtimes)?

Very old Anki versions stored the tags in a separate table, instead of embedding them in a note. I seem to recall it not performing well at the time, as the tags table could grow into the millions of entries. That may have been poor table design though; I do not recall the details off the top of my head.

Recently I've been working a bit on the AnkiDroid code to see if I can give them a bit of a push towards moving to the new backend. Until they've fully bought into it and can update to new versions promptly, we're somewhat limited in the breaking changes we can make, as I don't want clients to have to go through expensive upgrade/downgrade steps just to talk to each other.

RumovZ commented 2 years ago

Sorry, I couldn't locate this - could you quote or restate the issue here?

My bad! I was referring to the first answer here, but upon reading it again, I realise it doesn't say what I thought it did. I'm still not sure I understand the problem with making privateness a flag. Do you mean the following scenario?

User exports note with tag foo and mtime 123.
User makes foo private.
User reimports note (privately).

Wouldn't we just update the tags separately from the notes, like we currently do with most other objects?

Until they've fully bought into it and can update to new versions promptly, we're somewhat limited in the breaking changes we can make, as I don't want clients to have to go through expensive upgrade/downgrade steps just to talk to each other.

I understand, but do you think this may happen in the long term? I wonder if I should think of tags more like full Anki objects or rather plain strings with some additional related metadata.

dae commented 2 years ago

User A has a note with tags foo and bar, and mtime 123. They have marked 2 as private in the tags table.
They export the note, and it ends up in the apkg file with only tag foo, and mtime 123. User B imports it.
User A sets the tag to public, and re-exports. Since only the tags table was changed, it ends up in the apkg with both tags, but the same mtime.
User B imports again, and the new tag is not added, because the existing note has an identical mtime.
Importing from the tags table doesn't really make sense, since the user doesn't want the tag itself, but wants it applied to the relevant notes.

I understand, but do you think this may happen in the long term?

It will require rewriting the majority of our tag handling code, so the gains we get from doing so would need to be fairly large to justify the effort. Operations that modify the text of tags are the hierarchy would become faster, but accessing the tags of a note would require looking them up in the DB and potentially caching the results. The syncing protocol would need adjusting, and importing would need to deal with the tags as strings, as the tag ids in the source and target decks may not match.

I wonder whether it's worth the effort?

RumovZ commented 2 years ago

Thanks for explaining, I finally understand. This would indeed make updating a bit more cumbersome. Don't we have the same problem with private fields? There the mtime bump happens in the notetypes table.

Currently, an exported note is an exact clone of its source note, so it rightfully has the same mtime. With privateness, we create a divergent ad-hoc note, so the mtime may or may not make sense. Maybe we need to invariably bump the mtime on partially exported notes?

dae commented 2 years ago

You're right, it would be a problem there too.

Bumping the mtime each time could cause newer changes to be overwritten with older ones. One alternative I can think of is to check for source.mtime >= target.mtime instead of source.mtime > target.mtime, ie updating in the matching mtime case. Ideally we'd only add them to the list of updated notes if the tag or field content had actually changed, so we'd need to compare the two in the identical case before applying/counting the changes.

RumovZ commented 2 years ago

Right. I think we should check for identity in the source.mtime > target.mtime case as well, because the newer mtime may result from changes to private fields alone. In which case we also might overwrite newer changes with older ones, by the way.

I think this once more shows the inadequacy of using apkgs for updates. Would be great if deck authors could generate a JSON file only containing the diff between their originally shipped deck and current collection.

dae commented 2 years ago

I suspect JSON imports will improve on some issues and introduce others, and diff-based updates rely on the target collection having the same or similar base point as the collection that was used to generate the diff.

In any case, it sounds like you're more enthusiastic about the JSON side of things at the moment, and we don't currently have a fully-fleshed-out implementation to compare to the existing functionality of that add-on. So maybe it would be worth investing some more time on that side of things first, and then circling back to the apkg side of things later?

RumovZ commented 2 years ago

Sounds good!

and diff-based updates rely on the target collection having the same or similar base point as the collection that was used to generate the diff.

I imagine deck authors would share an apkg initially and then offer JSON-based updates. These would be much smaller in size and less likely to conflict with any changes endusers have made. If you tried to import an update without the base deck, probably nothing would happen, as all the referenced ids would resolve to nothing. For this workflow to work, we'd need a way to include media, probably reusing the archive tooling for apkgs. This reminds me, we should also add some meta info with a version number we can bump when making breaking changes.

Does this sound like a direction in which you'd like to go? (Of course, I'd try to find some low-hanging fruit first and leave media and so on for a later iteration.)

dae commented 2 years ago

Agreed on dealing with the media later; one possibility would be to allow a .ankijson file inside an .apkg container instead of an .anki2 (we'd need to decide whether that would be better than a separate extension or not)

Deck authors tend to both add new cards and modify existing ones, as well as sometimes modifying the notetype structure or changing notes to a different notetype, so ideally we will handle all of those cases

Does this sound like a direction in which you'd like to go?

Sounding good. Apologies I haven't been as involved recently; I've been diving into the AnkiDroid code lately, and have also had other non-Anki related things going on.

RumovZ commented 2 years ago

Repurposing apkgs is an interesting idea. I myself was taking a bit of a break and didn't perceive any lack of involvement. And I'm very excited to see AnkiDroid's backend integration coming along!

ankitects / anki