Stvad / CrowdAnki

Plugin for Anki SRS designed to facilitate cooperation on creation of notes and decks.
MIT License
536 stars 43 forks source link

Generate a unique ``crowdanki_uuid`` for each deck and subdeck during every export #164

Open Raagaception opened 2 years ago

Raagaception commented 2 years ago

This is not a feature request, but is rather a niche use case I need to understand for publishing updates for a massive deck repository I manage; my apologies for the long read.

Context (why I can't use tags instead of subdecks)

The Problem I need to solve

The very niche solution I'm trying to find

aplaice commented 2 years ago

Specific solutions/hacks

So the question really is, is there an easy way to modify the code of CrowdAnki at my end, locally, to force it to generate a new, unique subdeck uuid for each and every export?

I think it'd be sufficient for the deck uuids not to be stored in the database (or immediately removed after being added), which should be straightforward to do.

You can manually remove the uuids using the debug console (Ctrl+Shift+;):

Debug console script ```python for deck in mw.col.decks.all(): if deck["name"].startswith("NAME_OF_TOP_LEVEL_DECK"): if "crowdanki_uuid" in deck: print(f'Deleting UUID {deck["crowdanki_uuid"]} for deck {deck["name"]}') del deck["crowdanki_uuid"] mw.col.decks.save(deck) ```

or automatically before every export with this patch:

CrowdAnki patch ```diff --- a/crowd_anki/export/anki_exporter_wrapper.py +++ b/crowd_anki/export/anki_exporter_wrapper.py @@ -50,6 +50,7 @@ def exportInto(self, directory_path): # https://github.com/Stvad/CrowdAnki/wiki/Workarounds-%E2%80%94-Duplicate-note-model-uuids. disambiguate_note_model_uuids(self.collection) + remove_deck_uuids(self.collection) # .parent because we receive name with random numbers at the end (hacking around internals of Anki) :( export_path = Path(directory_path).parent self.anki_json_exporter.export_to_directory(deck, export_path, self.includeMedia, @@ -66,3 +67,13 @@ def exporters_hook(exporters_list): exporter_id = get_exporter_id(AnkiJsonExporterWrapper) if exporter_id not in exporters_list: exporters_list.append(exporter_id) + + +def remove_deck_uuids(collection): + for deck in collection.decks.all(): + # I assume you want this to happen only for a certain deck (and all its subdecks) + if deck["name"].startswith("NAME_OF_RELEVANT_TOP_LEVEL_DECK"): + if "crowdanki_uuid" in deck: + print(f'Deleting UUID {deck["crowdanki_uuid"]} for deck {deck["name"]}') + del deck["crowdanki_uuid"] + collection.decks.save(deck) ```

The advantage of deleting the UUIDs just before exporting (rather than, say, after exporting) is that the deck UUIDs will stay the same between exports. In particular, if somebody sends you an update to the deck, you'll still be able to import it, without the notes changing deck etc.

I could, of course, make a Python script to randomize and replace uuids in the exported .json to achieve the same effect, but then are there any naming convention issues or compatibility issues I might have to contend with?

There aren't really any compatibility issues. Any generated random string should work... If you want to follow convention (ideally please do! :)) then we're using uuids generated with uuid1 from the uuid module. (from uuid import uuid1)

The slight issue is that if you, yourself, import such a deck (with randomised uuids), then all your notes will be moved to newly created decks and sub-decks (just like what would happen for all other users of your deck :)).

Alternatives

You could have a "deleted" sub-deck and when you want to delete a note, instead move it into that sub-deck. This is obviously annoying, though...


I hope the above helps! :)


General solutions

Obviously, the issue with notes being deleted from a deck is a general one.

Long-term it'd be nice for CrowdAnki to have an option to deal with upstream-deleted notes. For instance, we could, on import:

  1. automatically delete such notes
  2. "flag" them with an appropriate tag (say crowdanki_deleted?).
  3. move them into a special deck — either a top-level one (say crowdanki_deleted?) or a sub-deck of the relevant deck (RELEVANT_DECK::crowdanki_deleted).

(I personally strongly prefer 2.)

Finding notes in a deck which aren't in the currently imported CrowdAnki deck is pretty straightforward. The tricky part is distinguishing notes that were deleted "upstream" from "personal" notes that were added by the given user.

AFAICT there are two main options to resolve this:

  1. Have "upstream" explicitly generate a list of note uuids that have been deleted.

    This would work even if the user moves all their notes into another deck (and wants to keep them there).

    The disadvantage is that it makes export more tricky (we first have to read the existing CrowdAnki deck JSON file, to check which note uuids are present (so that we can get the diff wrt to uuids in Anki) and to get the list of uuids already marked as deleted (so that we can append to that list)).

    It can also get confusing in a more decentralised workflow (which is the ideal goal of CrowdAnki). What happens when person A deletes some subset of notes, person B deletes another subset of notes and person C undeletes some notes?

    Further, if somebody exports without having access to the existing deck JSON file, then the list of deleted uuids will get lost.

  2. Have the user tag notes as "personal" (crowdanki_personal?) — i.e. should not be touched by CrowdAnki either during import or export.

    This doesn't work if the user moves their notes into another deck.

    It has the advantage that it simultaneously solves the issue of notes that should not be exported when contributing back "upstream".

    It also feels more elegant in terms of being close to the relevant git approach. Thinking in git metaphors, tagging a note as "personal" is equivalent to adding a file to .git/info/exclude (local .gitignore).

    To make it easy for users, we should add a "is_personal" toggle (that would tag notes as personal) to the anki note creation UI.

    * Unfortunately, modifying the editing UI is particularly prone to breakage.
    
    * I'm not sure if the toggle should by default be on or off?

    What happens if a user marks an "upstream" note as personal? Prevent it from being updated?

    What happens if we modify the note type, that a personal note belongs to? (This obviously is still an issue even if the note isn't marked as personal.)

I personally lean towards 2, but the approaches are complementary (one focuses on the upstream-deleted notes, the other focuses on the personal ones), so we could, in principle, use both — have an optional deleted_note_uuids list in the CrowdAnki JSON and also encourage users to mark personal notes as such.

ohare93 commented 2 years ago

Alternatives

You could have a "deleted" sub-deck and when you want to delete a note, instead move it into that sub-deck. This is obviously annoying, though...

Another options is a "Hard sync" checkbox on import, which would move any cards not in the imported deck(s), but that exist in your local decks, into this "deleted" / "unknown notes" deck, so that the user may delete or move back at their own choosing.

Would be easy to implement too, just:

  1. Parse all the decks that are being imported into
  2. Move all cards from all those decks into this "unknown notes" deck
  3. Import. Notes in the import will be moved to the relevant decks. (Note: not compatible with "Do not move cards" import option)

Just a thought about how to accomplish this :+1:

Edit: Oh I see you general said this later on. My bad :sweat_smile:

ohare93 commented 2 years ago

Tagging is surely the way to go. It is clear and obvious, especially for many users all syncing their cards together (rather than a user syncing to one shared deck).

Raagaception commented 2 years ago

The slight issue is that if you, yourself, import such a deck (with randomised uuids), then all your notes will be moved to newly created decks and sub-decks (just like what would happen for all other users of your deck :)).

@aplaice case in point, that's my intended end goal! The way I thought to implement my public deck hierarchy was <main deck name>::<version name>::<subdeck-hierarchy>. So when someone updates to a newer version with the same or slightly modified sub-deck hierarchy (for example, say from v1.0.0 to v1.2.0, 10 cards are deleted in v1.2.0, and v1.2.0 has 1000 cards total with a lot of the subdeck arrangement changed), then they have <main deck name>::v1.0.0::<old-subdeck-hierarchy> with 10 cards present in their pre-existing outdated subdeck positions, but the empty outdated subdecks still exist. All the remaining notes in v1.0.0 which are also there in v1.2.0 get shifted to <main deck name>::v1.2.0::<new-subdeck-hierarchy>. The user's personally created cards, and the deleted cards stay under <main deck name::v1.0.0::<subdeck-hierarchy>, which they can review and delete at their own discretion.

There aren't really any compatibility issues. Any generated random string should work... If you want to follow convention (ideally please do! :)) then we're using uuids generated with uuid1 from the uuid module. (from uuid import uuid1)

I don't know much about how Anki modules work unfortunately, just some basic Python to get me by here and there (it is on my bucket list of things to learn though). But if I just replace the uuids of all the 100+ subdecks in the exported .json file after export with Regex search and replace, say, with random simple strings like "crowdanki_uuid" : "0001", they should just work and create new decks but still move the existing notes into it, right? I'll try it out tomorrow and close this issue if it works. Thanks a ton for your reply!

Tagging is surely the way to go. It is clear and obvious, especially for many users all syncing their cards together (rather than a user syncing to one shared deck).\

@ohare93 Truly, it is in retrospect. Before making my big release, I had a single 4000 card deck with extensive tagging and zero subdecks. Problem is, most users who find my deck are usually Anki noobs, and I constantly got DMs along the line of "How do I select only specific topics to study from your deck!?". Many didn't even know that the Anki Browse pane existed. So, I figured I would just convert all the tagging and nested tags to nested decks instead, which would mean beginners could just scroll the sub decks and just click and review the topics they needed, and pros could make filtered decks as usual because they of course know how Anki works. I'm painfully aware that Anki isn't very sub deck friendly, because in all fairness tags were the intended way - but this is a compromise I made to hopefully make using my deck easier and faster for end users. :D

Raagaception commented 2 years ago

Long-term it'd be nice for CrowdAnki to have an option to deal with upstream-deleted notes

YES PLEASE! I personally prefer method 2) too, CrowdAnki_deleted tags enabled on import by default definitely would be a god-sent. Any user on import can already specify their tags and do a reverse-search to delete non-existent ones (in the CrowdAnki import dialogue), but this seems much smoother and quicker.

aplaice commented 2 years ago

I don't know much about how Anki modules work unfortunately, just some basic Python to get me by here and there (it is on my bucket list of things to learn though).

That's a general python module (not anki-specific).

But if I just replace the uuids of all the 100+ subdecks in the exported .json file after export with Regex search and replace, say, with random simple strings like "crowdanki_uuid" : "0001", they should just work and create new decks but still move the existing notes into it, right?

Yes, that will work. However, very simple strings like "0001" are problematic (in the long-term) since it'd be easy for you to accidentally end up re-using the same string and if somebody else decides to use the same approach then it's likely there'd be uuid collisions which would be annoying and hard to debug, for end-users, so I don't recommend them.

The following quickly/sloppily written python script should carry out the uuid change, with uuid1-generated uuids:

Python script (updated as below) ```python import json from uuid import uuid1 def regenerate_uuid_in_deck(deck_dict): deck_dict["crowdanki_uuid"] = str(uuid1()) def regenerate_uuid_in_subdecks(deck_dict): children = deck_dict["children"] for child in children: regenerate_uuid_in_deck(child) regenerate_uuid_in_subdecks(child) with open("deck.json") as f: deck_dict = json.load(f) regenerate_uuid_in_subdecks(deck_dict) with open("deck_modified.json", "w") as f: json.dump(deck_dict, f, indent=4, sort_keys=True, ensure_ascii=False) ```

It doesn't change the uuid of the top-level deck, since AFAIU you don't want the top-level deck to be moved, just all its subdecks. Please feel free to modify it as you see fit or complain if it doesn't work. :)


It seems that we all agree that tags are the way to go! (On second thought, unless and until we properly set up handling/tagging of "personal" notes, maybe a better tag name might be crowdanki_deleted_or_missing?)

ohare93 commented 2 years ago

It seems that we all agree that tags are the way to go! (On second thought, unless and until we properly set up handling/tagging of "personal" notes, maybe a better tag name might be crowdanki_deleted_or_missing?)

I was actually thinking that this would be a good problem for Brain Brew to solve: just use tags, but have an export option that moves the tags into subdecks. So the tag "Science::Physics::Forces::Gravity" would just be put in a subdeck 4 levels down, but "Science::Physics" would only be two levels down. Then you can "have your cake and eat it too" in that one can offer a deck based on tags or subdecks, all without actually touching subdecks yourself 😁

Of course having this feature in CrowdAnki would be a solution too! But I doubt that would happen.

aplaice commented 2 years ago

It seems that we all agree that tags are the way to go!

I had meant tags for "marking" deleted/missing notes. Thinking about how to allow sub-decks for people who prefer sub-decks to tags you're far ahead of me! :)

but have an export option that moves the tags into subdecks.

How would you deal with notes with multiple tags, though? Have a list of special, mutually-exclusive tags?

In any case, it's a very interesting idea! :)

ohare93 commented 2 years ago

I had meant tags for "marking" deleted/missing notes.

Ah right, of course 😁

How would you deal with notes with multiple tags, though? Have a list of special, mutually-exclusive tags?

First come first serve? Throw an error? 😆

In any case, it's a very interesting idea! :)

👍

Raagaception commented 2 years ago

The following quickly/sloppily written python script should carry out the uuid change, with uuid1-generated uuids:

@aplaice That would solve my use case elegantly, but seems like it removes all line breaks in the output .json file. I of course had to tell it to use UTF-8 to get rid of an error, did that mess it up or something? The modified line in question : with open("deck.json", "r", encoding='utf-8') as f:

Here's the before and after (with the extension changed to .txt since GitHub doesn't support uploading .json for some reason)

aplaice commented 2 years ago

I of course had to tell it to use UTF-8 to get rid of an error, did that mess it up or something?

No. The script was just quickly/sloppily written. :D

You can use json.dump(deck_dict, f, indent=4, sort_keys=True, ensure_ascii=False) instead of the previous json.dump.

```python import json from uuid import uuid1 def regenerate_uuid_in_deck(deck_dict): deck_dict["crowdanki_uuid"] = str(uuid1()) def regenerate_uuid_in_subdecks(deck_dict): children = deck_dict["children"] for child in children: regenerate_uuid_in_deck(child) regenerate_uuid_in_subdecks(child) with open("deck.json") as f: deck_dict = json.load(f) regenerate_uuid_in_subdecks(deck_dict) with open("deck_modified.json", "w") as f: json.dump(deck_dict, f, indent=4, sort_keys=True, ensure_ascii=False) ```
Raagaception commented 2 years ago

@aplaice YES, the script helped me get the perfect end result!

So, here's how an update looks on the user's end : 1) Older, existing version
image 2) After import
image
(So now the sub decks under v0.15.0 contain any deleted cards or user created cards. So basically all cards in v0.15.0 don't exist in update v0.16.0, which the user recently imported) 3) The user may choose to delete the now outdated version, after which it looks like this.
image

Perfect and elegant. I'm grateful for your help and support, just an amazing plugin all around! Feel free to close this issue.