apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
61.69k stars 13.49k forks source link

Superset 3.0.2 Public Googlesheet can be read without service account credentials but throws error when you add creds #26452

Open rajivpatki opened 8 months ago

rajivpatki commented 8 months ago

A clear and concise description of what the bug is.

How to reproduce the bug

  1. Create a public google sheet
  2. Add the google sheet as database to your Superset 3.0.2. It works just fine and data shows up in SQL Lab
  3. Now delete this and this time when adding google sheet database select Public and Private Sheets and provide your service account credentials (by file or by text, doesn't matter)
  4. See error

Expected results

Actual results

On adding the credentials, you see on the interface The URL could not be identified. Please check for typos and make sure that ‘Type of Google Sheets allowed’ selection matches the input image

In superset logs you see:

superset  | 2024-01-10 15:10:23,177:WARNING:superset.views.base:SupersetErrorsException
superset  | Traceback (most recent call last):
superset  |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
superset  |     rv = self.dispatch_request()
superset  |   File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
superset  |     return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
superset  |   File "/usr/local/lib/python3.9/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset  |     return f(self, *args, **kwargs)
superset  |   File "/app/superset/views/base_api.py", line 127, in wraps
superset  |     raise ex
superset  |   File "/app/superset/views/base_api.py", line 121, in wraps
superset  |     duration, response = time_function(f, self, *args, **kwargs)
superset  |   File "/app/superset/utils/core.py", line 1526, in time_function
superset  |     response = func(*args, **kwargs)
superset  |   File "/app/superset/utils/log.py", line 255, in wrapper
superset  |     value = f(*args, **kwargs)
superset  |   File "/app/superset/views/base_api.py", line 93, in wraps
superset  |     return f(self, *args, **kwargs)
superset  |   File "/app/superset/databases/api.py", line 1433, in validate_parameters
superset  |     command.run()
superset  |   File "/app/superset/databases/commands/validate.py", line 72, in run
superset  |     raise InvalidParametersError(errors)
superset  | superset.databases.commands.exceptions.InvalidParametersError: [
    SupersetError(
        message='The URL could not be identified. Please check for typos and make sure that ‘Type of Google Sheets allowed’ selection matches the input.',
        error_type=<SupersetErrorType.TABLE_DOES_NOT_EXIST_ERROR: 'TABLE_DOES_NOT_EXIST_ERROR'>,
        level=<ErrorLevel.WARNING: 'warning'>,
        extra={
            'catalog': {'idx': 0, 'url': True},
            'issue_codes': [
                {'code': 1003, 'message': 'Issue 1003 - There is a syntax error in the SQL query. Perhaps there was a misspelling or a typo.'},
                {'code': 1005, 'message': 'Issue 1005 - The table was deleted or renamed in the database.'}
            ]
        }
    )
]

Environment

browser type and version: Microsoft Edge Version 120.0.2210.91 (Official build) (64-bit) superset version: 3.0.2 from official docker image any feature flags active:

SECRET_KEY = "SUPERSET_SECRET_KEY"
WTF_CSRF_ENABLED = True

FEATURE_FLAGS = {
    "ENABLE_TEMPLATE_PROCESSING": True,
}

Checklist

Make sure to follow these steps before submitting your issue - thank you!

Additional context

rusackas commented 4 months ago

Are you still facing this? Tempted to close as stale, assuming you figured something out. I know we're using a ton of google sheets with credentials at Preset, so I suspect there's something in the credentials causing a problem (maybe a character like an @ or something), or this was an intermittent bug that came and went. As a side note, you should set a new secret key, too ;)

Let's see if our shiny new bot has more advice:

@dosu-bot

dosubot[bot] commented 4 months ago

It looks like the issue you're experiencing with Superset and Google Sheets integration, particularly after adding service account credentials, might be multifaceted. The error message indicating that the URL could not be identified, despite your detailed troubleshooting, suggests a deeper issue possibly related to how Superset is handling or parsing the service account credentials.

Here are some refined steps and considerations to help you troubleshoot and hopefully resolve the issue:

  1. Ensure Correct Service Account Credentials Format: Double-check that the JSON credentials for your service account are correctly formatted. This includes making sure that all necessary fields are present and that special characters, especially in the private_key, are properly escaped. Incorrect formatting or missing fields can lead to authentication issues.

  2. Verify Google Sheet Sharing Settings: Make sure the Google Sheet you're trying to access is shared with the client_email found in your service account credentials. This step is crucial for granting the necessary access permissions to your service account.

  3. Review Superset Configuration: Go through your superset_config.py file again to confirm that everything is set up correctly for Google Sheets integration. This includes checking the FEATURE_FLAGS and any other configurations that might affect how external databases are connected.

  4. Correct Use of Encrypted Extra: In the database connection UI, ensure that the Encrypted Extra field accurately contains your service account credentials as a JSON string. This often involves making sure the JSON is correctly stringified and that no formatting issues are introduced.

  5. Pay Attention to Error Messages and Logs: The error messages and logs from Superset can provide valuable insights into what might be going wrong. Since you've encountered specific errors related to the URL identification and potential issues with \n characters in your JSON, it's worth revisiting these messages for any clues they might offer.

  6. Check Superset Version Compatibility: Verify that your current version of Superset (3.0.2) fully supports the features and methods you're using for Google Sheets integration. If there have been updates or bug fixes in more recent versions that address your issue, upgrading might be necessary.

Given the steps you've already taken, including attempts to modify the gsheets.py script to handle \n characters, it's clear that the problem might not be straightforward. If these suggestions don't lead to a resolution, it could indicate a more specific issue with the gsheets.py script or a unique aspect of your setup. Ensuring that all configurations and credentials are correct and looking into any potential updates or fixes in newer versions of Superset could be key to resolving this.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

rajivpatki commented 4 months ago

Are you still facing this? Tempted to close as stale, assuming you figured something out. I know we're using a ton of google sheets with credentials at Preset, so I suspect there's something in the credentials causing a problem (maybe a character like an @ or something), or this was an intermittent bug that came and went. As a side note, you should set a new secret key, too ;)

Let's see if our shiny new bot has more advice:

@dosu-bot

This issue still exists for me in Superset 3.1.2. Our "workaround" is not ideal, we have to schedule a script to regularly update another database with the googlesheet contents

If I don't provide any credentials.json for a pulicly shared sheet, the connection works fine. As soon as credentials are specified I get the error "The URL could not be identified. Please check for typos and make sure that ‘Type of Google Sheets allowed’ selection matches the input." which is the same error that I get for private sheets.

  1. These credentials work with a python script using gspread so there are no auth or access issues
  2. I have tried using the credentials.json file as is as downloaded when the key is created for the service account
  3. I have also tried to escape /n as //n but that does not work either. It works in the python script either way
dosubot[bot] commented 4 months ago

To further address the persistent issue you're encountering with connecting Google Sheets to Superset 3.1.2 using service account credentials, even after ensuring the basic setup and configurations are correct, let's delve into a more detailed approach:

  1. Ensure the Google Sheets URL is Correct: The format should strictly follow https://docs.google.com/spreadsheets/d/[SPREADSHEET_ID]/edit. Any deviation from this format could lead to connection issues.

  2. Verify Service Account Permissions: It's crucial that the Google Sheet is explicitly shared with the client_email associated with your service account. This step is often overlooked but is essential for granting access.

  3. Check the Format of Your Service Account Credentials: The JSON credentials must be correctly formatted. This includes ensuring that the private_key within the JSON is properly escaped. Given that you've confirmed these credentials work with a Python script, this might not be the issue, but it's worth re-verifying, especially after any modifications.

  4. Catalog Configuration: In Superset, ensure that there's a correct mapping between the table names you're using and the URLs of the Google Sheets. This might involve reviewing the configuration files or settings within Superset that handle external database connections.

  5. Match the Type of Google Sheets Allowed: The error message you're encountering suggests a mismatch in the settings regarding the type of Google Sheets allowed. Double-check that your Superset configuration aligns with the permissions you've set up in the Google Cloud Platform and within Superset's database connection settings.

  6. OAuth2 and Impersonation Settings: If your setup involves OAuth2 or impersonating a user account, ensure that all the necessary scopes and parameters are correctly configured. This includes verifying that the service account has the appropriate permissions to access the Google Sheets API.

If after following these steps, the issue remains unresolved, consider exploring these additional solutions:

Given the complexity of the issue and the steps you've already attempted, it's clear that a solution may require a combination of verifying configurations, permissions, and perhaps even looking into recent updates or changes in Superset's handling of Google Sheets connections. If these suggestions do not lead to a resolution, it could be beneficial to reach out to the Superset community or review open issues in the Superset GitHub repository for any similar cases and potential fixes.

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

rajivpatki commented 3 months ago

@dosu

I also tested using a public google sheet that was also shared with the service account email address. The connection works fine without the credentials. If I supply any credentials, the error that I mentioned earlier shows up.

dosubot[bot] commented 3 months ago

To address the issue with connecting Google Sheets to Superset 3.1.2 using service account credentials, consider the following targeted steps:

  1. Ensure Correct Google Sheets URL Format: The URL should strictly follow the format https://docs.google.com/spreadsheets/d/[SPREADSHEET_ID]/edit. Any deviation could lead to connection issues.

  2. Verify Service Account Permissions: Confirm that the Google Sheet is shared with the client_email from your service account credentials, granting at least viewer access.

  3. Check Service Account Credentials Format: Ensure the JSON credentials are correctly formatted, especially the private_key, and contain all required fields.

  4. Review 'Type of Google Sheets allowed' Selection: Double-check that your Superset configuration aligns with the permissions set up in Google Cloud Platform and Superset's database connection settings.

  5. Inspect Encrypted Credentials Handling: If using encrypted credentials, ensure they are correctly decrypted and valid.

  6. Enable Google Sheets API: Verify that the Google Sheets API is enabled for the project associated with the service account.

  7. Update Superset and Dependencies: Make sure your Superset installation and its dependencies are up to date.

  8. Debug Connection Attempt: Utilize logging to identify where the connection process might be failing.

By following these steps and ensuring all configurations and permissions are correctly set, you should be able to resolve the connection error. Additionally, consider changing the Google Sheet's locale settings or ensuring data is in text format as potential solutions.

#### Details **Sources** - [[Gsheets/shillelagh] Unable to add privately shared sheets](https://github.com/apache/superset/issues/21152) - [superset/db_engine_specs/gsheets.py](https://github.com/apache/superset/blob/master/superset/db_engine_specs/gsheets.py) - [tests/unit_tests/db_engine_specs/test_gsheets.py](https://github.com/apache/superset/blob/master/tests/unit_tests/db_engine_specs/test_gsheets.py) - [superset/utils/encrypt.py](https://github.com/apache/superset/blob/master/superset/utils/encrypt.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.