Incremental Indexing - Githubissues

gilesknap / gphotos-sync

Google Photos and Albums backup with Google Photos Library API

Apache License 2.0

1.97k stars 161 forks source link

Incremental Indexing #445

Closed ido-ran closed 6 months ago

ido-ran commented 9 months ago

Thank you for this great project to overcome Google's deliberate attempt to keep data inside Google Photos. I love Google Photos and I'm paying for it and very happy with it but I do want to be prepared for the scenario I lose access to Google Photos and I want to know my photos are safe.

I'm trying to run gphoto-sync but I have over 700 GB of photos. The way the script is currently built means it will have to first index all of my photos. I'm pretty sure I'll run into the quote "All requests per day" which is 10,000 request / day before the script will manage to index all of my photos.

I was wondering if anyone else run into the same issue, and if so how did you resolved it?

I'm pretty sure that even if I'll wait for tomorrow the script will re-start but because it didn't finish the first indexing it will just start again from "searching for media start=None, end=None, videos=True" which will just get into the same problem every day (am I wrong?)

gilesknap commented 9 months ago

Hi @ido-ran.

You may be right. I see your workarounds as follows:

set number of threads to very low (1 might be good) - this will slow down your downloads to spread the load on quota
use date ranges to do your photos in chunkss
once you are synced all is good because the incremental backup will only search for photos since the last backup time.

(use --help to see how to set those options)

ido-ran commented 9 months ago

Hi, Thank you for the quick response. I was quick the assume I'll run out of quote but at the end I've used 5.18k out of the 10k.

I would like to know what you think about changing the script to store the last page_token so it will be able to pickup from where it left off.

I agree that for most part once the full scan has complete search by date will produce much smaller results but every once in a while I do either upload photos without date, which usually end up in 1970 or I upload old photos which the search by date will not find.

gilesknap commented 9 months ago

Yep - you need to use --flush-index occasionally if older photos have been uploaded. I should probably make that clear in the docs.

re: keeping the page_token. I would have thought that it has a reasonably short expiry but I could be wrong.

ido-ran commented 9 months ago

I have photo frame project that also download photos from Google Photos and I've noticed I can use page_token even a day later. I'll give it a go and will let you know. I am currently get to about 8.2k requests when the quota is 10k, I also run it close to midnight so I use some quota of today and some tomorrow. Storing the page_token should also allow the script to pickup where it left off when stopping the indexing in the middle, that will also be a useful feature.

ido-ran commented 9 months ago

One last question, can I use --rescan insetad of --flush-index? It seems like rescan start to scan from scratch but just doesn't drop the database.

gilesknap commented 9 months ago

Yes you are right that is a better option!

gilesknap commented 6 months ago

I believe this issue is completed - closing