anki-geo / ultimate-geography

Geography flashcard deck for Anki
https://ankiweb.net/shared/info/2109889812
Other
828 stars 82 forks source link

Notes in translated decks have new GUIDs #391

Closed axelboc closed 3 years ago

axelboc commented 3 years ago

See https://github.com/axelboc/anki-ultimate-geography/discussions/390#discussioncomment-346750 for context, and https://github.com/Stvad/CrowdAnki/issues/6 for root cause.

This issue is to "consider whether we should recommend using the Do Not Move Existing Cards option and whether we should add guidance/reassurance about the creation of a "duplicate" deck" on first CrowdAnki import.


EDIT the duplicated deck issue is more severe than initially thought -- see https://github.com/axelboc/anki-ultimate-geography/issues/391#issuecomment-777310776 below.

KobaYounes commented 3 years ago

I have the same issue with the extented french version. I tried with and without Do Not Move Existing Cards but I have the same behaviour. The duplicate Deck is like a new deck with 1508 new cards and the old one is unchanged.

aplaice commented 3 years ago

Irrespective of whether you use Do Not Move Existing Cards or not, a new deck will be created — it just affects which of the decks contains the bulk of the cards.

Are you sure that the old deck is unchanged? (Just to be certain, to avoid unnecessary debugging.)

If the old deck is indeed unchanged (i.e. that you have two sets of very similar cards/notes — two France → Paris cards etc.) and if you have time, would it be possible for you to export the old deck, with CrowdAnki? (File > Export > Export format: CrowdAnki JSON representation (*directory), Include: Ultimate Geography > Export...)

Ideally, if you're confident that you didn't add anything private to the deck (accidentally added private notes to the deck etc. — none of the learning or scheduling information is exported by CrowdAnki) could you post the exported deck.json? Otherwise, would you be willing to check some of the guids? (For instance, is the guid e+/O]%*qfk present in the deck.json?)

KobaYounes commented 3 years ago

Yes the old deck is unchanged. Here is the deck.json.

deck.json.txt

KobaYounes commented 3 years ago

I dont know much about CrowdAnki but the guids are not the identical for a given card.

aplaice commented 3 years ago

Thanks very, very much for this! It's really helpful!


The issue is caused by the fact that the guids have all changed in all the translated versions of the decks (which we had not realised...).

I believe that this is because of the difference in how anki-dm and brain-brew work with guids.

anki-dm would take the guid from src/data.csv, apply some transformation to it (don't remember the details) and use that in the generated deck.json.

brain-brew just takes the guid in src/data/main.csv without any modification.

Hence, when we switched from anki-dm to brain-brew we replaced the guids in src/data/main.csv with the generated values for the English deck.

What I didn't realise until now is that anki-dm's guid transformation was language-specific. (This was presumably so that people could use multiple translated decks in parallel, which is pretty convenient!)

(In my partial defence, the transformation is the same for the "normal" and "extended" decks and I didn't realise that this was different for the translated decks. (Logically, though, there's no reason to use both the "normal" and "extended" decks in parallel, while there is reason to use translated decks in parallel.)

This means that in the translated decks, none of the new guids match the old ones...

KobaYounes commented 3 years ago

Happy to help.

axelboc commented 3 years ago

Woops. I've added a warning to the release. Let's discuss the issue further in here, then.

I agree that using translated decks in parallel might be interesting to some users, so we should perhaps consider matching the GUID generation of Anki-DM for translations: https://github.com/OnkelTem/anki-dm/blob/master/src/Builder.php#L39-L42 and https://github.com/OnkelTem/anki-dm/blob/master/src/Builder.php#L132

$deck_build['deck']['uuid'] = Util::uuidEncode($deck_build['deck']['uuid'], $lang);
$deck_build['config']['uuid'] = Util::uuidEncode($deck_build['config']['uuid'], $lang);
$deck_build['model']['uuid'] = Util::uuidEncode($deck_build['model']['uuid'], $lang);
// ...
$deck_fields_data[$i]['guid'] = Util::guidDecode($cell, $deck_build['model']['uuid']);

Looks like the algorithm is here: https://github.com/OnkelTem/anki-dm/blob/bdf90599964b2101bd1f0c04ab68c7466d8ee270/src/Util.php#L268

Doesn't look trivial... 😂

axelboc commented 3 years ago

Here are a few choices I can think of if we do want to have separate GUIDs in translated decks:

  1. Reproduce Anki-DM's algorithm in Brain Brew.
  2. Hard-code the old GUIDs somewhere in the CSVs and tell Brain Brew to use them somehow.
  3. Implement a new algorithm in Brain Brew and force users to lose progress.

Option 2 has the advantage of making it clear that the GUIDs are different in translated decks. @ohare93 will probably have insight into whether any of these options are too difficult to implement.

ohare93 commented 3 years ago

What I didn't realise until now is that anki-dm's guid transformation was language-specific.

This means that in the translated decks, none of the new guids match the old ones...

💯 👀 😱

Looks like the algorithm is here: https://github.com/OnkelTem/anki-dm/blob/bdf90599964b2101bd1f0c04ab68c7466d8ee270/src/Util.php#L268

Doesn't look trivial... 😂

I think you're overcomplicating it, we can just do what I did originally to fix this: Take the guids from the Anki-DM generated deck, and copy paste them into the new csv. It will just take a little moving around to place the current guids into the English csv, and then have an empty guid column be in the other language csvs ready for a copy paste. Actually I have a Python script that can copy them from old decks and put it into the csvs correctly, I so I can fix this no problem 👍 Any new entries can just be left blank, for each language, and a new guid will be generated on writing each of them.

Bad that we did not catch this beforehand, but it is an easy fix in the end 👍


Edit: Ah wait, we have them as Country/Capital/etc. I guess we can have a Guids csv with one column for each language? Hmm, then in the future one would have to add each entry there, or an error would occur 🤔 I am unsure if that will cause further issues, or if it would be better to reorganise the csvs to be language specific instead (can be automated as a one off Recipe).

axelboc commented 3 years ago

We can move the English GUIDs out of main.csv and into a guid.csv with the following columns: country, guid, guid:de, etc. Is this what you have in mind? I think this is fine, but yeah, not sure what it means for adding new translations or new notes.

ohare93 commented 3 years ago

We can move the English GUIDs out of main.csv and into a guid.csv with the following columns: country, guid, guid:de, etc. Is this what you have in mind? I think this is fine, but yeah, not sure what it means for adding new translations or new notes.

That is indeed what I mean. It would fix this issue very easily.

Let's fix this first, then make it better later 😉 I will have time to do this on Saturday, not before. But if others wish to get something done beforehand I can help with small efforts here and there 👍

aplaice commented 3 years ago

Hard-coding the guids is probably the way to go, definitely in the short/medium run. (In the long run, it might be nice to have the option in brain-brew to automatically generate multiple GUIDs somehow, from a single stored GUID, since it feels ugly to store multiple GUIDs for a single note, but it's more of an aesthetic preference, and we certainly shouldn't worry about it now.)

We can move the English GUIDs out of main.csv and into a guid.csv with the following columns: country, guid, guid:de, etc. Is this what you have in mind? I think this is fine,

That's what I understood and I agree that it'd be fine.

but yeah, not sure what it means for adding new translations or new notes.

Ideally brain-brew recipes/source_to_anki.yaml would automatically generate the relevant GUID rows or columns, but that's for the medium-term.


Do we want the GUIDs for the newly added translations to be what they "would have been" if we were still using anki-dm, just in case, or not necessarily (i.e. just random GUIDs)? (Edit: On second thought, probably not worth worrying about — i.e. just random GUIDs for SV,CZ,RU and NL, I think.)

ohare93 commented 3 years ago

Ideally brain-brew recipes/source_to_anki.yaml would automatically generate the relevant GUID rows or columns, but that's for the medium-term.

It already does 👍 Only for each guid entry that is empty, that is. My one off migration suggestion is so that we can fill the other language guids with the specific guid that was already used 👍

aplaice commented 3 years ago

Yeah, I know that it generates the guid — I just wasn't sure if it would work out-of-the-box (without some tweaking) for multiple guid columns (guid:de etc.).

ohare93 commented 3 years ago

  - notes_from_csvs:
      part_id: english
      # save_to_file: src/notes/english.yaml

      note_model_mappings:
        - note_models: &note_models_in_mapping
            - Ultimate Geography
          columns_to_fields:
            <<: &default_columns_to_fields
              guid: guid       <<<<------------------------------------------------------ 1
              tags: tags

              Flag: Flag
              Map: Map

            Country          : Country
            Country info     : Country info
            Capital          : Capital
            Capital info     : Capital info
            Capital hint     : Capital hint
            Flag similarity  : Flag similarity

          personal_fields: [ ]

      file_mappings: &file_mappings
        - file: src/data/main.csv
          note_model: Ultimate Geography
          derivatives:
            - file: src/data/country.csv
            - file: src/data/country_info.csv
            - file: src/data/capital.csv
            - file: src/data/capital_info.csv
            - file: src/data/capital_hint.csv
            - file: src/data/flag_similarity.csv

      generate_guids: true   <<<<------------------------------------------------------ 2

The only change required is for the guid mapping to be changed to guid:en and for all for all of the notes_from_csvs to also have generate_guids: true set (currently only English does, just to do it once at the start) 👍 Could be some type of race condition based on when the files save and read, but if so then it will be easy to solve.

ohare93 commented 3 years ago

See #396 for a (partial?) fix. Turns out French and Spanish always have shared the same guids... :scream:

axelboc commented 3 years ago

I've re-uploaded the ZIPs of the translated decks with the GUID fix to the v4.1 release and removed the upgrade warning. I've also tagged master with v4.1.1.