MarshalX / atproto

The AT Protocol (🦋 Bluesky) SDK for Python 🐍
https://atproto.blue
MIT License
278 stars 30 forks source link

Auth token handling improvements #268

Closed grantmcconnaughey closed 7 months ago

grantmcconnaughey commented 7 months ago

I'm running into an issue with auth token handling that hopefully I can explain.

My understanding of Bluesky auth

  1. A user handle + app password creates a new session, which provides an access token and refresh token.
  2. The access token expires in 2 hours. To refresh it you use the refresh token.
  3. The refresh token expires in 2 months. To refresh it you use the handle + app password to create a new session again.

How atproto handles this

atproto handles scenario 1 and 2, but does not handle scenario 3. _should_refresh_session will check if the access token is nearly expired, and if so, will refresh the access token using the refresh token. However, atproto does not check if the refresh token is expired and handle refreshing it.

Right now Client.login() accepts either the handle/password or the session string, but not both. If the session string is passed then the handle/password are ignored.

My request

  1. Change auth to handle both access token refreshing and refresh token refreshing.
    1. That will mean checking that both tokens are not expired. If the access token is expired then it is refreshed with the refresh token. If the refresh token is expired then it is refreshed using the username/password via CreateSession (assuming that the handle/password are passed into Client.login).
  2. Add some sort of callback functionality when a token is refreshed by _refresh_and_set_session. This will allow me to know when the session string has changed so I can store it in the database and use it in the future when creating new Clients. Otherwise I have to call export_session_string after every atproto method because I don't know which one may have created a new session.
MarshalX commented 7 months ago

Hi! Awesome description!

  1. I have a plan to provide a callback on token refresh to be able to save it in any external storage because the current implementation is not obvious when you should export the session string. But now you documented my thoughts xd. Thank you!

  2. Updating the refresh token after few months by login and password is not what SDK want to provide. Because it will require saving a password in memory/somewhere. Not a secure way. And there is a global question. If your script didn't work with a token for more than 2 months should you still have access to the account or not? I mean refresh token will expire only if you don't use it. For each new token refreshes, the refresh token will be updated. Correct me if I'm wrong. I'm okay to provide some API that will help to catch this case from users end. So you should store the password somewhere in a somewhat encrypted format and SDK will request it on demand. Sounds like one more callback. Wdyt?

Upd. Maybe I'm so strict about saving password in a client instance and it's okay…

grantmcconnaughey commented 7 months ago

I think as long as clients are storing handles and app passwords in an encrypted manner then it is fine. For some apps, like Postpone, our only choice is to store this information, otherwise a user would have to log into their Bluesky account every 2 months or else our connection would break.

Updating the refresh token after few months by login and password is not what SDK want to provide. Because it will require saving a password in memory/somewhere. Not a secure way.

Saving the session string in memory is nearly as unsafe as storing passwords, because it provides unfettered access to an account for up to 2 months 3 months (I just double checked and Bluesky refresh tokens actually expire in 3 months). So both handle/password and session strings/tokens should be encrypted by clients if they are stored for long-term use.

I don't think the SDK needs to request it on demand, no. I think it's up to the user of the client to encrypt the handle, password, and session string and pass them to the Client every time.

In pseudocode, I think it would look something like this:

def login(self, login = None, password = None, session_string = None, token_refresh_handler = None):
    if session_string:
        if is_refresh_token_expired(session_string) and login and password:
            session = refresh_refresh_token(login, password)
            if callable(token_refresh_handler):
                token_refresh_handler(session)
        elif is_access_token_expired(session_string):
            session = refresh_access_token(session_string)
            if callable(token_refresh_handler):
                token_refresh_handler(session)
        else:
            raise ValueError('Cannot refresh session')
    elif login and password:
        session = create_session(login, password)
    else:
        raise ValueError('Either session_string or login and password should be provided.')

    self.me = self.get_profile()
    return self.me

That's totally just pseudocode, and I'm sure it would look different in reality.

MarshalX commented 7 months ago

Ah, sure. We can update the token on login(). I thought about the Client() instance that will work for a few months

MarshalX commented 7 months ago

Btw why instead of storing passwords you can't perform some “check” request 1 per month/two to update the refresh token? Or even few days before refresh token expiration

grantmcconnaughey commented 7 months ago

Yeah, that is definitely something we can do, especially if the API for SessionString is stable.

In that case we can store the session string and every time before creating the client we can check if the refresh token is about to expire. If so, use the stored handle/password instead of the session string to create a new session.

It would be nice if the client could handle this automatically, but as long as the API for session string is stable it should be fine for client users to handle it instead.

MarshalX commented 7 months ago

I mean don't store password at all. Or it's not possible to update expiration date of refresh token by requesting a new one?

grantmcconnaughey commented 7 months ago

Ah, gotcha! My understanding is that the only way to request a new refresh token is via handle and app password.

So refresh tokens generate new access tokens, and handle + password generates new refresh tokens.

Unless refreshing access tokens also provides a new refresh token with a new expiration, in which case this entire GitHub issue can be ignored. 😂 Sounds like I need to do some more exploring.

And yes, in that case we would not need to store handle/password at all; only tokens.

grantmcconnaughey commented 7 months ago

I have confirmed that refreshing a token via refreshSession does also provide a new refresh token with an extended expiration date. 🤦

So in that case, Postpone does not need to store app username/password, and instead we can only store the session string.

It still would be nice to have some sort of callback handler for when a new session string is generated so we can store that in the DB, however it won't be too tricky for us to handle that ourselves anyway.

Sorry for all the trouble!

MarshalX commented 7 months ago

Awesome! Callback will be provided pretty soon

grantmcconnaughey commented 7 months ago

Thank you! I really appreciate all you're doing for Bluesky Python development. 🙏