apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.66k stars 4.57k forks source link

[DSIP][Security] Optimizing the OAuth2 login functionality to support users integrating with their own authentication centers. #16472

Open hdygxsj opened 4 weeks ago

hdygxsj commented 4 weeks ago

Search before asking

Motivation

DS, as a scheduling platform, is typically deployed on the company's intranet and used by the company's developers. Therefore, users often need to integrate with their company’s internal authentication center. However, the OAuth protocols of these internal authentication centers may vary from company to company, and there might be unique methods for fetching user information. Thus, I believe that DS needs to provide a way for users to integrate with their company’s internal OAuth authentication center.

Design Detail

Most users use the authorization code mode of OAuth2, and currently ds only implements the authorization code mode, so this reconstruction is only for the authorization code mode, and it can continue to expand if necessary in the future.

Google's authorization code mode authentication process is as follows.

image

Different providers may have some differences when obtaining tokens through authorization codes, especially the interfaces used to obtain user information. The main difference lies in the packaging of the request body, such as some providers require the authorization code to be spliced on the url, some providers require in the request body. On the other hand, some providers require to use the post method to obtain user information, some use the get method. And finally, the return body of the user information interface is also very different.

/* Factory of {@link OAuth2AuthorizeCodeClient} / public interface OAuth2ClientFactory {

/**
 * OAuth2 provider name
 */
String provider();

/**
 * Create oauth client
 */
OAuth2AuthorizeCodeClient createAuthorizeCodeClient(OAuth2ClientProperties oAuth2ClientProperties);

}


- [ ] Add default implementations for the OAuth plugin, such as GitHub, Gitee, Google, etc.
- [ ] Add an implementation of the plugin that is as generic as possible.
- [ ] Modify /redirect/login/oauth2 to load the corresponding provider plugin via SPI and complete the login process by obtaining the token and user information through the authorization code.

### Compatibility, Deprecation, and Migration Plan

The redirect/login/oauth2 interface needs to be modified, and without any other modification or compatibility issues

### Test Plan

Since OAuth2 needs to connect with third-party websites, I have no experience in how to add IT test the provider of OAuth2. If there is any good suggestion, I will implement it.

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
SbloodyS commented 3 weeks ago

Since DS is a scheduling platform, not an oauth2 integration tool. For this feature you request, is there any way we could keep those oauth2 plugin related stuff tested against each future release? Since no maintainers in this community are familiar with all of it and have free access to different resources at the same time, it is not possible for the community to maintain the code.

So I think we should not adapt so many oauth2 types at this stage until we can find a way to address those issue. We just need to fit a common oauth2 type, for example Google oauth2.

Waiting to see other's opinions~

hdygxsj commented 3 weeks ago

If DS is deployed on the company Intranet, it does not make sense to interconnect with the google oauth2 certification authority. Instead, I think we don't even need to implement any OAuth2 provider, just provide an api for users to connect to the OAuth2 authentication authority within the company

SbloodyS commented 3 weeks ago

If DS is deployed on the company Intranet, it does not make sense to interconnect with the google oauth2 certification authority. Instead, I think we don't even need to implement any OAuth2 provider, just provide an api for users to connect to the OAuth2 authentication authority within the company

I think this is an internal company customization requirement that should not be implemented in an open source project. You can easily implement secondary development it in org.apache.dolphinscheduler.api.security.SecurityConfig#authenticator.

ruanwenjun commented 3 weeks ago

@hdygxsj Hi, thanks for your DSIP, in your design, the auth2 is used in api module, right?

Right now in api module all login should bind to user, since ds have its own authority role for all api. So once we use auth2 login from third-party success, and then get a token, we should bind the token to a user, otherwise we don't know the token is bind to which ds user.

So the use logic might like:

  1. We need to create ds user in DS, and the ds user should can bind to third-part token e.g by username or some else.
  2. We use auth2 login from thirty and get a token, then auto login by the user which bind to the token.

So We might need to create a table which record the mapping between thirty-part user and ds user, e.g. GitHub user ruanwenjun in ds user is admin.

If I have misunstand please tell me know.

hdygxsj commented 3 weeks ago

If DS is deployed on the company Intranet, it does not make sense to interconnect with the google oauth2 certification authority. Instead, I think we don't even need to implement any OAuth2 provider, just provide an api for users to connect to the OAuth2 authentication authority within the company

I think this is an internal company customization requirement that should not be implemented in an open source project. You can easily implement secondary development it in org.apache.dolphinscheduler.api.security.SecurityConfig#authenticator.

The OAuth2 authorization code mode does not use entering user names and passwords in ds. Therefore, AbstractAuthenticator does not apply to the OAuth2 authorization code protocol, and users cannot extend it to connect with other OAuth2 providers

hdygxsj commented 3 weeks ago

Hi @ruanwenjun , In the existing implementation, after the successful authorization of OAuth2, it will obtain the username from the oauth provider's user info api and judge whether the user already exists in ds according to the username. If it does exist, it will create a session in ds and use the username to complete the login to support rights management function inside dolphin. If it does not exist, it will create a new user in ds and create session.

We can also modify it so that the created user after logging in with OAuth2 needs to be bound to an existing user in ds, so we need to maintain a mapping relationship. If necessary, we can discuss in detail at the ds's meeting.

ruanwenjun commented 3 weeks ago

Hi @ruanwenjun , In the existing implementation, after the successful authorization of OAuth2, it will obtain the username from the oauth provider's user info api and judge whether the user already exists in ds according to the username. If it does exist, it will create a session in ds and use the username to complete the login to support rights management function inside dolphin. If it does not exist, it will create a new user in ds and create session.

We can also modify it so that the created user after logging in with OAuth2 needs to be bound to an existing user in ds, so we need to maintain a mapping relationship. If necessary, we can discuss in detail at the ds's meeting.

Ok, if we can directly use username as the mapping it's OK to me, and it's better don't auto create user in ds, since only admin can create user, the design should provide the whole login logic.

hdygxsj commented 3 weeks ago

the whole login logic

Ok, I'll perfect the design later

Gallardot commented 3 weeks ago

I think implementing a generic oauth2 is enough. If there is a need for more customization, it is a more general and maintainable solution such as keycloak or dexidp.