Philipp91 / picasa2digikam

Script to migrate Picasa metadata to digiKam
GNU General Public License v3.0
18 stars 3 forks source link

[bug] AssertionError during non-dry run #17

Closed UtopianElectronics closed 1 month ago

UtopianElectronics commented 2 years ago

Here's the command:

python main.py ^
    --photos_dir="D:\gallery" ^
    --digikam_db="D:\digiKam_library\digikam4.db" ^
    --contacts="%LocalAppData%\Google\Picasa2\contacts\contacts.xml"
    -vv

And the error:

DEBUG: self_contact_to_tag={'c180c0983bdb4c7c': 968, '2aef23ae02c3be': 969 [AND SO ON]}
Traceback (most recent call last):
  File "C:\Users\USERNAME\picasa2digikam\main.py", line 66, in <module>
    main()
  File "C:\Users\USERNAME\picasa2digikam\main.py", line 56, in main
    migrator.migrate_directories_under(input_root_dir=args.photos_dir, db=db,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERNAME\picasa2digikam\migrator.py", line 107, in migrate_directories_under
    contact_tags_per_dir[dir] = migrate_directory(dir, files, db,
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERNAME\picasa2digikam\migrator.py", line 154, in migrate_directory
    assert contact_to_tag[contact_id] == tag_id
AssertionError
Philipp91 commented 2 years ago

https://github.com/Philipp91/picasa2digikam/pull/18 will improve the error message for this. It will tell you which directory is affected. If you open up the picasa.ini file in there, you should find the contact ID and the tag ID that the error message complains about. If you then navigate up the directory hierarchy, you should find further picasa.ini files, and in one of them the same contact ID would be assigned to a different tag ID. That is, the Picasa data is inconsistent.

Does this really not happen in dry-run mode? I wonder if perhaps this -1 in dry-run mode hides discrepancies somehow, but I can't really imagine how that would work. Perhaps we'll better understand this once you figured out which IDs are affected and how they differ in those directories.

UtopianElectronics commented 2 years ago
Traceback (most recent call last):
  File "C:\Users\USERNAME\picasa2digikam\main.py", line 66, in <module>
    main()
  File "C:\Users\USERNAME\picasa2digikam\main.py", line 56, in main
    migrator.migrate_directories_under(input_root_dir=args.photos_dir, db=db,
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERNAME\picasa2digikam\migrator.py", line 107, in migrate_directories_under
    contact_tags_per_dir[dir] = migrate_directory(dir, files, db,
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USERNAME\picasa2digikam\migrator.py", line 154, in migrate_directory
    assert contact_to_tag[contact_id] == tag_id, (
AssertionError: 1a70911985fb6277 maps to 966 in D:\gallery\2019 but to 784 in an ancestor dir

So 1a70911985fb6277 is the contact ID and 966 and 784 are the tag ID's? What's a tag ID? I can't find tag ID's in .picasa.ini files. Where are they?

Does this really not happen in dry-run mode?

No. In the dry-run there's no such error.

Philipp91 commented 2 years ago

I'll rethink this logic tomorrow or on the weekend, it might be an outright logic error on my part where it attempts to insert a new digiKam tag into the database for the old Picasa 1a70911985fb6277 contact, and it does so twice on different directory levels, thereby ends up with two different tag IDs and then notices that they're different IDs.

Philipp91 commented 2 years ago

Could you perhaps post the respective lines from D:\gallery\2019\.picasa.ini and D:\gallery\.picasa.ini that contain 1a70911985fb6277? I'm trying to see if (1) they both exist and (2) both map to the same name/value.

If the latter is the case, then I think the script is right to fail (and I will change it to fail in dry-run mode already in such cases) because those IDs should be unique per person and not refer to different people in different places.

UtopianElectronics commented 2 years ago

They both exist, however I must have edited the person's name in Picasa because the 1a70911985fb6277 contact ID in D:\gallery\.picasa.ini refers to for example Lorem ipsum but in D:\gallery\2019\.picasa.ini it refers to Lorem ipsum dolor.

There's also a .picasa (2).ini file in D:\gallery, but I don't think it's relevant to this issue.

UtopianElectronics commented 2 years ago

So should I manually edit the .picasa.ini files and make the tag names identical, or there's going to be a fix/patch?

Philipp91 commented 2 years ago

I will change the code now to detect this situation even in dry-run mode. That should allow you to estimate the extent of this problem in your case before starting to fix every single file. Also you can start fixing them without actually running lots of migrations that fail half-way through.

If it turns out that a lot of cases are affected and you don't care which name is picked (i.e. it's slight renames of the same person and not two different people), I could also add a flag to ignore such cases by picking a random one of the two names.

Philipp91 commented 2 years ago

https://github.com/Philipp91/picasa2digikam/commit/9cceede2a00d18c3d99ed30f4e825203814de1c2 Improves the error message and https://github.com/Philipp91/picasa2digikam/commit/008aedce4d84879408ec485c4f6b73a953bbb674 makes the errors appear during dry-runs too.

I've pushed both commits to the main branch directly, so you can test it from there. I don't have my dev environment on this machine, so I wasn't able to test this. Hopefully it "compiles" and works as intended.

UtopianElectronics commented 2 years ago

Finally, I was able to successfully migrate tags for one directory, although there were 7 tag name mismatch errors which I fixed them manually by editing the .picasa.ini files. I noticed that contacts.xml has the latest version of the tag names that had this issue, however, some tags are not found in contacts.xml. Would you please explain tag ID's? How are they numbered?

Philipp91 commented 2 years ago

Tag IDs come from the digiKam database and are presumably minted in increasing order (42, 43, 44, 45, ...) as new tags are added during a picasa2digikam run. 41 would thereby be the highest ID of a pre-existing tag in digiKam.

7 tag name mismatch errors which I fixed them manually by editing the .picasa.ini files.

Is that a lot? How long would it take you to fix all the mismatches in all the directories you plan to migrate?

And is the outcome of your manual fix much better than if a random one of the two names had been chosen instead?

I noticed that contacts.xml has the latest version of the tag names that had this issue, however, some tags are not found in contacts.xml.

So it would be best to prefer the contacts.xml name if it exists and otherwise (a) let the user correct the ini file or (b) pick a random one?

Sadly, the way that contacts.xml was integrated doesn't allow merging these two name sources easily, because they're mapped to tag IDs at quite different places in the code. So it would require a bigger refactoring first.

UtopianElectronics commented 2 years ago

9cceede Improves the error message and 008aedc makes the errors appear during dry-runs too.

I haven't tried it yet. I'll try it if this issue happens again for other directories.

I've pushed both commits to the main branch directly, so you can test it from there.

Thank you. So git fetch is all I have to run?

41 would thereby be the highest ID of a pre-existing tag in digiKam.

As an example? Or always true? If so, why?

Is that a lot? How long would it take you to fix all the mismatches in all the directories you plan to migrate?

Not a lot. It didn't take so long. I just opened the files in Notepad++ and made the changes.

And is the outcome of your manual fix much better than if a random one of the two names had been chosen instead?

No, not necessarily. I think it may be better for picasa2digikam to choose the longer name out of the two, but there was an instance in which the mismatch was due to a typo (mistake in the order of the letters in a word) and both names had the same length.

So it would be best to prefer the contacts.xml name if it exists [...]

Yes.

[...] and otherwise (a) let the user correct the ini file or (b) pick a random one?

Both are good. Personally, I'm fine with manually editing the .picasa.ini files. But ideally, it would be great if the program displays a message with both names and ask the user to choose one of them or else, type in a new name.

Philipp91 commented 2 years ago

So git fetch is all I have to run?

No. A fetch just makes remote changes known to the local mirror in the origin/... branches. To update your actual local branch, you need sth like git rebase after the fetch, or git pull to do it all in one go.

As an example? Or always true? If so, why?

41 is just an example.

But ideally, it would be great if the program displays a message with both names and ask the user to choose one of them or else, type in a new name.

I'll file a feature request then and leave it unresolved for now. I'm not so sure about asking the user, because then they would have to re-enter the same information over and over again as they're going through iterative dry-runs to resolve other, unrelated issues.