gilesknap / gphotos-sync

Google Photos and Albums backup with Google Photos Library API
Apache License 2.0
1.97k stars 161 forks source link

Creating Duplicate Photos Once Finished #472

Closed ae0200 closed 5 months ago

ae0200 commented 5 months ago

Hi, thank you for writing gphotos-sync. I have however run into a problem where upon completion of downloading all my images, the program is going back over certain images and downloading them again with the suffix (2). Is there a reason for this? image

gilesknap commented 5 months ago

Hi @ae0200. Back in the early days of digital photography (I started using digicams in 1996) cameras just used the same file names repeatedly, even when they started using incremental filenames the early ones reset to zero when the battery went flat.

I have gone to a VERY great trouble to handle the fact that my library has hundreds of photos with the same file name from the early days. Putting the (2) on the first duplicate name and then incrementing that number was the standard that windows used and I adopted it because it helped a little with old photos whose filenames had been mangled by windows back in the 90's!!!

So I'm pretty sure that what you are seeing is that google photos has two copies of a photo are different, even if only very slightly. I say that because it will merge two uploads of a file with identical contents into a single file.

If you go look in your library on the Google web and search for the name IMG_2405 for example, do you see two results?.

Also - those IMG_xxx filenames sound just like the ones from the very early cameras that I'm talking about.

ae0200 commented 5 months ago

Thank you for the quick reply. Yes I see where you're coming from, I remember having problems with cameras re-using file names when doing other projects.

I don't believe this is the case though because I went to the folder where these images were saved and they are showing up as straight duplicates. I can't work out a pattern with them as to why only some have been duplicated so far, and why it started with photos from 2021. I'll attach some images below to show my point.

image image

gilesknap commented 5 months ago

Please can you check for duplicates by searching for one of the filenames in the Google Photos Web UI.

ae0200 commented 5 months ago

I have checked and there isn't. HOWEVER, I have found a pattern with the images being 'duplicated'.

They are all images which have been sent to me through the sharing tab in google photos. I wonder if your program reads through the main section in google images and saves them, and then reads through the sharing tab and saves everything in there regardless of whether it is also in the main tab?

gilesknap commented 5 months ago

That has some logic to it. However I don't see this for photos shared with me even after I have pressed the 'add to library' button. Please can you report the command line you are using to make these backups.

ae0200 commented 5 months ago

Yes certainly, I am completely aware that this could just be a user error on my part.

I installed this on a windows machine (because of a large unused hard drive) using the python method outlined here https://gilesknap.github.io/gphotos-sync/main/how-to/windows.html

I then changed the security to allow symlinks.

I then ran the program with the command

C:\Users\John\AppData\Local\Programs\Python\Python36\Scripts\gphotos-sync.exe D:\Alex\gphotos

I can provide the log via a direct message if you would like.

gilesknap commented 5 months ago

Sure, send me your log - gilesknap@gmail.com

gilesknap commented 5 months ago

Hi Alex,

Does it seem to do the same thing every day - as in re-downloading photos from 2021. They should be already in the index and get skipped.

One thing I notice is it looks like there was a keyboard interrupt at the end. Did you hit ctrl C? There is likely to be a pause at the end while finalizing the database and if that is cancelled I imagine your database is incomplete.

Maybe you did this because it hangs at the end? I think there is a good chance that your DB is corrupted.

please try --flush-index command line option.

You won't loose any files, it will just delete the index and re-scan your library, re-creating the DB index.

ae0200 commented 5 months ago

I think that this has fixed everything, thank you.

I ran --flush-index The log shows that all photos were indexed and then all downloads were skipped because the file already exists. I then ran the program normally to see if it would be satisfied that all photos are already downloaded and it was.

The only problem I had was that the command prompt threw up a few logging errors but it didn't stop the program from running.

Thank you so much for spending the time to help me out. It means a lot

gilesknap commented 5 months ago

Glad that worked. It sounds like you had a corrupted DB somehow. The code is supposed to flush the DB at enough intervals to reduce this risk but you got unlucky.