gilesknap / gphotos-sync

Google Photos and Albums backup with Google Photos Library API
Apache License 2.0
1.97k stars 161 forks source link

google colab request #388

Closed mikebilly closed 1 year ago

mikebilly commented 1 year ago

Hi!, Thanks for your amazing repo. I'd like to suggest creating a colab notebook so that I can easily copy photos and videos from google photos and save it in a folder in my google drive

gilesknap commented 1 year ago

Hi, thanks for the suggestion. I'm not that keen on this because gphotos-sync was born out of Google deprecating its own photos-drive sync, so we moved on from this a long time ago.

What is your use case? If you already have photos in Google why have them in two places in Google (and use up your Google disk quota) ?

If I were to look at this I'd prefer to do it is a desktop app that talks to the photos and drive APIs. That is because Notebooks don't make a great development environment due to lack of version control (at least last time I looked - this is my only effort in colab notebooks https://colab.research.google.com/drive/1zBHuFfGpqkv8I96epPfo45xxyKg2_FTh?usp=sharing)

mikebilly commented 1 year ago

Thanks @gilesknap for your quick reply. The reason is that: I have a shared drive with great unusual space, meanwhile my own account has 15gb limit. I want to copy my photos and videos from my google photos to that shared drive so that I can make use of my shared drive to store medias. I want to do this on google colab because the download and upload speed is high. Hope you could help me with this.

gilesknap commented 1 year ago

I would not recommend this for 2 reasons:

If you still want this and are interested in doing it yourself, I'm happy to help with ideas and some code pointers.

gilesknap commented 1 year ago

Oh, and by the way, one of the API limitations is you can't delete any photos unless they were uploaded by the same App. Therefore you would need to delete them manually to get your space back after the tool did a copy to drive. This is obviously error prone and a risk to your photos.

The delete issue is one of many that have been outstanding for years and Google is simply not going to fix. See:

mikebilly commented 1 year ago

Thanks @gilesknap for your reply, well I think I might need an api to get the list of filenames on google photos and make sure that every filename in that list is present in my shared drive. If a single filename is not present, I might copy that single file to my shared drive, can I do that? And after that I can safely delete everything in my google photo?

gilesknap commented 1 year ago

This is one of the big issues I faced with gphotos-sync. The mapping between allowed names in the library and allowed filenames is a little fiddly, for example the same filename may appear multiple times in the same album. Since Drive is your target it would also allow the same filename multiple times. HOWEVER you are left to work out how to know which is which, if you have two PHOTO1 in photos but only one in Drive then which one do you copy over?

To solve this gphotos-sync uses a database and keeps additional meta-data on each photo, including what it got called on the local filesystem.

To simply see how you list photos in the library you could look at https://github.com/gilesknap/gphotos-sync/blob/2f6bcbb537b18bf54d8f5426f1bac6727dbaf2d8/src/gphotos_sync/GooglePhotosIndex.py#L137-L227

The truth is this is non-trivial and it is why I made the project read-only.

gilesknap commented 1 year ago

I should be fair and say that modern cameras are pretty good at creating unique file names for their images. My collection goes back to 1996 and in those days cameras reset the image name counter on each memory card clean. I have hundreds of Image001.jpg in my collection!

So you could try ignoring this issue and assume that filenames are unique. But you would loose photos if this turned out not to be true.

mikebilly commented 1 year ago

Yeah I think so, I probably don't have 2 files with the same name but with different metadata. Google photos automatically backups my photos and videos on my phone so that's why I'm using it. And I'd say the photos and videos on my phone are numbered distinctly so there's no issue unless I reset my phone and it numbers from 0. I guess I might need a feature where when it sees a file with the same filename, it would check 2 metadatas and if they're the same, it skips, otherwise rename any of the two files to like xxx(2).png or sth then proceeds to copy.

gilesknap commented 1 year ago

Exactly. But you need a consistent way to get from metadata to files named (2) or (3) etc. You can't just do it in date order because you are going to be deleting files from the photos source. This is why I keep a DB of what I named a file. I also try to be consistent if someone wants to flush the DB and start again - but I have not been successful in coping with all corner cases. Rebuilding the DB after some files have been deleted from the source may not result in the same filenames the second time a cause the sync to overwrite already copied files.

gilesknap commented 1 year ago

I think this would be a better starting point for you instead of reading my code.

And once you have that working the rest of the REST API is documented here:

Drive has a similar REST API. I've had a google around and it looks like there are a few out of date python libraries wrapping it - not sure what to advise on this without more research

mikebilly commented 1 year ago

Seems fairly complicated with filenames and duplicate checking haha. Well I guess as long as I use the same device and I don't reset, I think I won't encounter any problem with filenames and duplicates. So my plan to copy to drive should work right?

gilesknap commented 1 year ago

I think the principal of what you want to do is viable.

You just need to be sure you can verify that you have a copies of files before you delete them from photos.

In my view it is quite difficult to guarantee that especially since you will need to do the deletes manually.

On Sat, 1 Oct 2022, 19:11 mikebilly, @.***> wrote:

Seems fairly complicated with filenames and duplicate checking haha. Well I guess as long as I use the same device and I don't reset, I think I won't encounter any problem with filenames and duplicates. So my plan to copy to drive should work right?

— Reply to this email directly, view it on GitHub https://github.com/gilesknap/gphotos-sync/issues/388#issuecomment-1264442701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHLRW42MGQZBBNP5HJ3I6LWBB5E3ANCNFSM6AAAAAAQ2PTBCM . You are receiving this because you were mentioned.Message ID: @.***>

mikebilly commented 1 year ago

Thank you for your helpful information, I'll begin to try to make this work on colab tomorrow

gilesknap commented 1 year ago

Good luck. Closing this as its not a gphotos-sync issue. I'll still respond here if you continue to post.

mikebilly commented 1 year ago
10-02 09:56:14 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 09:56:14.195939 
10-02 09:56:14 ERROR    Symbolic links not supported 
10-02 09:56:14 ERROR    Albums are not going to be synced - requires symlinks 
10-02 09:56:15 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 96, in authorize
    open_browser=False, bind_addr="0.0.0.0", port=self.port
  File "/usr/local/lib/python3.7/dist-packages/google_auth_oauthlib/flow.py", line 489, in run_local_server
    bind_addr or host, port, wsgi_app, handler_class=_WSGIRequestHandler
  File "/usr/lib/python3.7/wsgiref/simple_server.py", line 153, in make_server
    server = server_class((host, port), handler_class)
  File "/usr/lib/python3.7/socketserver.py", line 452, in __init__
    self.server_bind()
  File "/usr/lib/python3.7/wsgiref/simple_server.py", line 50, in server_bind
    HTTPServer.server_bind(self)
  File "/usr/lib/python3.7/http/server.py", line 137, in server_bind
    socketserver.TCPServer.server_bind(self)
  File "/usr/lib/python3.7/socketserver.py", line 466, in server_bind
    self.socket.bind(self.server_address)
OSError: [Errno 98] Address already in use
10-02 09:56:15 WARNING  Done. 

This is the first error that I got. In some other project, this type of flow worked for me:

SCOPES = ['https://www.googleapis.com/auth/youtube.upload']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

VALID_PRIVACY_STATUSES = ('public', 'private', 'unlisted')

# Authorize the request and store authorization credentials.
def get_authenticated_service(CLIENT_SECRETS_FILE):
  storage = Storage("/content/youtube-upload-credentials.json")
  credentials = storage.get()
  if credentials is None or credentials.invalid:
    flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, SCOPES)
    flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
    credentials = tools.run_flow(flow, storage, flags)

  return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)
gilesknap commented 1 year ago

This is just saying that the default auth flow port 8080 is already in use. You can choose a different port on the command line with --port (see gphotos-sync --help).

mikebilly commented 1 year ago

Sorry if I say anything dumb, but can I make it use the flow similar to

    flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, SCOPES)
    flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
    credentials = tools.run_flow(flow, storage, flags)

which opens a browser page for me to accept the api and I can get the authorization code, without creating a localhost server? Edit: image

gilesknap commented 1 year ago

I think you are using a function from the oauth2client library.

That is deprecated and you need to use https://github.com/googleapis/google-auth-library-python-oauthlib.

Now you might be able to do clever stuff because you are already logged in to Collab Notebooks, but I don't have any knowledge of that.

Instead, I would hope that the authentication code from gphotos-sync here should still work for you - you would need to make the token locally on your workstation first as the authetication flow only works on a local browser. See setting up for headless browsers here. https://gilesknap.github.io/gphotos-sync/main/tutorials/installation.html#headless-gphotos-sync-servers

mikebilly commented 1 year ago

Yeah that's what I'm facing, I want to do the authorization phase that gives me the token and credentials and everything by using my client_secret.json without needing my local browser or my local workstasion, the reason is that I want to run this colab notebook on my phone too.

mikebilly commented 1 year ago

As I've mentioned above, this authorization function works for my other project

SCOPES = ['https://www.googleapis.com/auth/youtube.upload']
API_SERVICE_NAME = 'youtube'
API_VERSION = 'v3'

VALID_PRIVACY_STATUSES = ('public', 'private', 'unlisted')

# Authorize the request and store authorization credentials.
def get_authenticated_service(CLIENT_SECRETS_FILE):
  storage = Storage("/content/youtube-upload-credentials.json")
  credentials = storage.get()
  if credentials is None or credentials.invalid:
    flow = flow_from_clientsecrets(CLIENT_SECRETS_FILE, SCOPES)
    flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
    credentials = tools.run_flow(flow, storage, flags)

  return build(API_SERVICE_NAME, API_VERSION, credentials = credentials)

So I've modified your authorization function from:

    def authorize(self):
        """Initiates OAuth2 authentication and authorization flow"""
        token = self.load_token()

        if token:
            self.session = OAuth2Session(
                self.client_id,
                token=token,
                auto_refresh_url=self.token_uri,
                auto_refresh_kwargs=self.extra,
                token_updater=self.save_token,
            )
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                self.secrets_file, scopes=self.scope
            )
            # localhost and bind to 0.0.0.0 always works even in a container.
            flow.run_local_server(
                open_browser=False, bind_addr="0.0.0.0", port=self.port
            )

            self.session = flow.authorized_session()

            # Mapping for backward compatibility
            oauth2_token = {
                "access_token": flow.credentials.token,
                "refresh_token": flow.credentials.refresh_token,
                "token_type": "Bearer",
                "scope": flow.credentials.scopes,
                "expires_at": flow.credentials.expiry.timestamp(),
            }

            self.save_token(oauth2_token)

        # set up the retry behaviour for the authorized session
        retries = Retry(
            total=self.max_retries,
            backoff_factor=5,
            status_forcelist=[500, 502, 503, 504, 429],
            allowed_methods=frozenset(["GET", "POST"]),
            raise_on_status=False,
            respect_retry_after_header=True,
        )
        # apply the retry behaviour to our session by replacing the default HTTPAdapter
        self.session.mount("https://", HTTPAdapter(max_retries=retries))

to

    def authorize(self):
        """Initiates OAuth2 authentication and authorization flow"""
        token = self.load_token()

        if token:
            self.session = OAuth2Session(
                self.client_id,
                token=token,
                auto_refresh_url=self.token_uri,
                auto_refresh_kwargs=self.extra,
                token_updater=self.save_token,
            )
        else:
            storage = Storage("/content/gphotos-sync-credentials.json")
            credentials = storage.get()
            if credentials is None or credentials.invalid:
              flow = flow_from_clientsecrets(self.secrets_file, self.scope)
              flags = tools.argparser.parse_args(args=['--noauth_local_webserver'])
              credentials = tools.run_flow(flow, storage, flags)

            # Mapping for backward compatibility
            oauth2_token = {
                "access_token": credentials.access_token,
                "refresh_token": credentials.refresh_token,
                "token_type": "Bearer",
                "scope": credentials.scopes,
                "expires_at": credentials.token_expiry.timestamp(),
            }

            self.save_token(oauth2_token)

        # set up the retry behaviour for the authorized session
        retries = Retry(
            total=self.max_retries,
            backoff_factor=5,
            status_forcelist=[500, 502, 503, 504, 429],
            allowed_methods=frozenset(["GET", "POST"]),
            raise_on_status=False,
            respect_retry_after_header=True,
        )
        # apply the retry behaviour to our session by replacing the default HTTPAdapter
        self.session.mount("https://", HTTPAdapter(max_retries=retries))

(from else: to self.save_token(oauth2_token)) And this is the error that I got:

10-02 14:27:21 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 14:27:21.954127 
10-02 14:27:21 ERROR    Symbolic links not supported 
10-02 14:27:21 ERROR    Albums are not going to be synced - requires symlinks 
/usr/local/lib/python3.7/dist-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access /content/gphotos-sync-credentials.json: No such file or directory
  warnings.warn(_MISSING_FILE_MESSAGE.format(filename))

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=771314848617-a78n0bc5osd04b3rabt2hqp8qdd88b9h.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.sharing&access_type=offline&response_type=code

Enter verification code: 4/1ARtbsJrkSVGbpu6z7dA-LQvcpS26V6kKDOPiI7J9YzZ309mFNSQcttSJ2vs
Authentication successful.
10-02 14:27:33 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 111, in authorize
    self.save_token(oauth2_token)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 79, in save_token
    dump(token, stream)
  File "/usr/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.7/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable
mikebilly commented 1 year ago

(from else: to self.save_token(oauth2_token)) And this is the error that I got:

10-02 14:27:21 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 14:27:21.954127 
10-02 14:27:21 ERROR    Symbolic links not supported 
10-02 14:27:21 ERROR    Albums are not going to be synced - requires symlinks 
/usr/local/lib/python3.7/dist-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access /content/gphotos-sync-credentials.json: No such file or directory
  warnings.warn(_MISSING_FILE_MESSAGE.format(filename))

Go to the following link in your browser:

    https://accounts.google.com/o/oauth2/auth?client_id=771314848617-a78n0bc5osd04b3rabt2hqp8qdd88b9h.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fphotoslibrary.sharing&access_type=offline&response_type=code

Enter verification code: 4/1ARtbsJrkSVGbpu6z7dA-LQvcpS26V6kKDOPiI7J9YzZ309mFNSQcttSJ2vs
Authentication successful.
10-02 14:27:33 ERROR    
Process failed. 
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
    self.setup(args, db_path)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
    self.auth.authorize()
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 111, in authorize
    self.save_token(oauth2_token)
  File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 79, in save_token
    dump(token, stream)
  File "/usr/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.7/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.7/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.7/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable

For that error, I've added this code:

import json
class SetEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, set):
return list(obj)
return json.JSONEncoder.default(self, obj)

and changed

def save_token(self, token: str):
with self.token_file.open("w") as stream:
dump(token, stream)
self.token_file.chmod(0o600)

to

def save_token(self, token: str):
with self.token_file.open("w") as stream:
dump(token, stream, cls=SetEncoder)
self.token_file.chmod(0o600)

But now I get this error:

10-02 15:23:43 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 15:23:43.480427 
10-02 15:23:43 ERROR    Symbolic links not supported 
10-02 15:23:43 ERROR    Albums are not going to be synced - requires symlinks 
10-02 15:23:43 ERROR    
Process failed. 
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 492, in main
self.setup(args, db_path)
File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/Main.py", line 341, in setup
self.auth.authorize()
File "/usr/local/lib/python3.7/dist-packages/gphotos_sync/authorize.py", line 131, in authorize
respect_retry_after_header=True,
TypeError: __init__() got an unexpected keyword argument 'allowed_methods'
10-02 15:23:43 WARNING  Done. 
mikebilly commented 1 year ago

So I added these commands:

!pip install urllib3 --upgrade 
!pip install requests --upgrade 
!pip install spotipy --upgrade

and it works. After running this command:

!gphotos-sync --secret="/content/client secret.json" "/content/drive/Shareddrives/Family_photos/backup" --port 307

I get this message:

10-02 15:42:02 WARNING  gphotos-sync 3.5.dev10+g2f6bcbb 2022-10-02 15:42:02.428861 
10-02 15:42:02 ERROR    Symbolic links not supported 
10-02 15:42:02 ERROR    Albums are not going to be synced - requires symlinks 
10-02 15:42:02 WARNING  Indexing Google Photos Files ... 
10-02 15:42:03 WARNING  indexed 2 items 
10-02 15:42:03 WARNING  Downloading Photos ... 
10-02 15:42:15 WARNING  Downloaded 2 Items, Failed 0, Already Downloaded 0 
10-02 15:42:15 WARNING  Done. 

However, the copied photos and videos are not in original quality. I have 1 photo and 1 video. The original photo is 64kb, while the copied version is 84kb with slight difference in quality. The original video is 3840×2160, 863,3mb (uploaded to google photo in original quality), while the copied video is 1920x1080 57,4mb

gilesknap commented 1 year ago

Yep. Remember I said the API was crippled. Again this is one Google has been sitting on for years. I don't mind the images too much because I cant see any visual difference with my image resolutions. But the videos are awful. https://issuetracker.google.com/issues/112096115

ALSO: note that you loose GPS info from your images.

gilesknap commented 1 year ago

Good work getting the auth going though!!

mikebilly commented 1 year ago

Thanks @gilesknap, I got your api working, but the only issue is that it doesn't download original quality videos and it strips gps info, for that reason I can't use google photo api. I'm looking for some sort of google takeout api.

gilesknap commented 1 year ago

Yeah, sorry about that - I should have thought to mention those specific limitations for your use case.

(it's not my API its Google's!! :-)