apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
60.48k stars 13.07k forks source link

Broken ssh connections: Could not establish session to SSH gateway #28651

Open 1yuv opened 1 month ago

1yuv commented 1 month ago

Bug description

We're seeing frequent failures of ssh connections to rdbms server from superset instance.

How to reproduce the bug

While loading a dashboard with 6/7 big number charts, we're seeing this sometime in one chart and sometimes in other chart.

  1. Click to add a database connection to Postgres.
  2. Configure SSH tunnel to remove server
  3. Connect to postgres
  4. Create few simple charts using this database connection.
  5. Browse dashboard, you'll see random charts citing failure to connect to SSH gateway.

Screenshots/recordings

image

Screenshot 2024-05-23 at 9 42 27 AM

Superset version

4.0.1

Python version

3.9

Node version

16

Browser

Chrome

Additional context

This issue has been seen on 3.0.2, 4.0.0 and 4.0.1 superset instances hosted on EKS.

Log:

2024-05-23 18:50:30,316:ERROR:sshtunnel.SSHTunnelForwarder:Could not connect to gateway sshserver:sshport : Error reading SSH protocol banner[Errno 104] Connection reset by peer
2024-05-23 18:50:30,316| ERROR   | Could not connect to gateway sshserver:sshport : Error reading SSH protocol banner[Errno 104] Connection reset by peer
2024-05-23 18:50:30,313:ERROR:paramiko.transport:paramiko.ssh_exception.SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer
2024-05-23 18:50:30,313| ERROR   | paramiko.ssh_exception.SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer
2024-05-23 18:50:30,302:ERROR:paramiko.transport:Exception (client): Error reading SSH protocol banner[Errno 104] Connection reset by peer
2024-05-23 18:50:30,302| ERROR   | Exception (client): Error reading SSH protocol banner[Errno 104] Connection reset by peer
2024-05-23 18:44:06,827:ERROR:sshtunnel.SSHTunnelForwarder:Could not connect to gateway sshserver:sshport : Error reading SSH protocol banner
2024-05-23 18:44:06,827| ERROR   | Could not connect to gateway sshserver:sshport : Error reading SSH protocol banner
2024-05-23 18:44:06,825:ERROR:paramiko.transport:paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
2024-05-23 18:44:06,825| ERROR   | paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
2024-05-23 18:44:06,799:ERROR:paramiko.transport:Exception (client): Error reading SSH protocol banner
2024-05-23 18:44:06,799| ERROR   | Exception (client): Error reading SSH protocol banner
2024-05-23 18:44:03,668:ERROR:sshtunnel.SSHTunnelForwarder:Could not connect to gateway sshserver:sshport : Error reading SSH protocol banner
2024-05-23 18:44:03,668| ERROR   | Could not connect to gateway sshserver:sshport : Error reading SSH protocol banner
2024-05-23 18:44:03,668:ERROR:paramiko.transport:paramiko.ssh_exception.SSHException: Error reading SSH protocol banner
2024-05-23 18:44:03,668| ERROR   | paramiko.ssh_exception.SSHException: Error reading SSH protocol banner

sshserver:sshport point to the actual server and port where ssh tunnel was established.

Checklist

dosubot[bot] commented 1 month ago

Hey @1yuv! I'm Dosu, a bot here to help you squash bugs, answer questions, and guide you in becoming a contributor. I'm here to assist while a human maintainer gets to your issue. How can I help you today?

To resolve the frequent failures of SSH connections to an RDBMS server from a Superset instance, follow these steps:

1. Ensure SSH Tunneling is Enabled

Verify that the SSH tunneling feature is enabled in your Superset configuration.

2. Check SSH Tunnel Configuration

Ensure that the SSH tunnel properties are correctly set in the database configuration. The _handle_ssh_tunnel method in the code handles the creation, update, and deletion of SSH tunnels based on the provided properties.

3. Handle SSH Tunnel Errors

Make sure to catch and log SSH tunnel-related errors for debugging purposes. The superset/commands/database/ssh_tunnel/exceptions.py file defines various SSH tunnel-related errors that can help you identify specific issues.

4. Check SSH Tunnel Management

The SSHManager class in superset/extensions/ssh.py is responsible for creating and managing SSH tunnels. Ensure that the SSH tunnel parameters, including server address, username, and credentials (password or private key), are correctly set.

5. Test Database Connection

The TestConnectionDatabaseCommand class in superset/commands/database/test_connection.py is responsible for testing the connection to the database, including handling SSH tunnels. Ensure that the connection test is successful and that any SSH tunnel-related errors are properly handled.

6. Check Superset Configuration

Ensure that the SSH tunnel properties are correctly defined and that the SSH gateway is accessible. Adjust the following configurations in Superset's superset_config.py to optimize the performance and stability of SSH tunnels:

7. Secure SSH Tunnel Configuration

Use the SSHTunnelForm component in the Superset frontend to configure SSH tunnel properties. This form allows you to input the SSH host, port, username, and authentication method (password or private key).

By following these steps, you should be able to resolve the frequent SSH connection failures and ensure a stable connection to your RDBMS server from the Superset instance.

#### Details **Sources** - [superset/commands/database/update.py](https://github.com/apache/superset/blob/master/superset/commands/database/update.py) - [superset/commands/database/ssh_tunnel/exceptions.py](https://github.com/apache/superset/blob/master/superset/commands/database/ssh_tunnel/exceptions.py) - [superset/extensions/ssh.py](https://github.com/apache/superset/blob/master/superset/extensions/ssh.py) - [superset/commands/database/test_connection.py](https://github.com/apache/superset/blob/master/superset/commands/database/test_connection.py) - [superset-frontend/src/features/databases/DatabaseModal/SSHTunnelForm.tsx](https://github.com/apache/superset/blob/master/superset-frontend/src/features/databases/DatabaseModal/SSHTunnelForm.tsx) - [superset/config.py](https://github.com/apache/superset/blob/master/superset/config.py) - [superset/utils/ssh_tunnel.py](https://github.com/apache/superset/blob/master/superset/utils/ssh_tunnel.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

1yuv commented 1 day ago

This is occuring more frequently and it looks like Superset's ssh tunneling is not working as it should. Every time you refresh your dashboard, one or other chart will throw error. Below is the log thrown by superset pod.

Traceback (most recent call last):
  File "/app/superset/connectors/sqla/models.py", line 1793, in query
    df = self.database.get_df(sql, self.schema, mutator=assign_column_label)
  File "/app/superset/models/core.py", line 581, in get_df
    with self.get_raw_connection(schema=schema) as conn:
  File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/app/superset/models/core.py", line 516, in get_raw_connection
    with self.get_sqla_engine_with_context(
  File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
    return next(self.gen)
  File "/app/superset/models/core.py", line 404, in get_sqla_engine_with_context
    with engine_context as server_context:
  File "/usr/local/lib/python3.9/site-packages/sshtunnel.py", line 1608, in __enter__
    self.start()
  File "/usr/local/lib/python3.9/site-packages/sshtunnel.py", line 1331, in start
    self._raise(BaseSSHTunnelForwarderError,
  File "/usr/local/lib/python3.9/site-packages/sshtunnel.py", line 1174, in _raise
    raise exception(reason)
sshtunnel.BaseSSHTunnelForwarderError: Could not establish session to SSH gateway