apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.72k stars 13.85k forks source link

SQL Lab queries can't be stopped #17011

Open ValentinC-BR opened 3 years ago

ValentinC-BR commented 3 years ago

When a SQL query is running is SQL Lab, it can't be stopped.

How to reproduce the bug

  1. Go to SQL Lab
  2. Write a query and run it
  3. Click on STOP
  4. (Nothing happens)

Expected results

The query should be stopped

Actual results

The query is still running

Screenshots

(Imagine a query running in the SQL Lab...)

Environment

(please complete the following information):

Checklist

Make sure to follow these steps before submitting your issue - thank you!

Additional context

The queries could be stopped in Superset v1.1

eschutho commented 3 years ago

Hi @ValentinC-BR, when you say nothing happens when you hit the stop button, if you look in your network tab, is your browser making the call to the stop_query endpoint and if so, what is the response?

ValentinC-BR commented 3 years ago

I actually see it in the network tab but after 2 sec of "pending" status, I get a 500 error : image

eschutho commented 3 years ago

Ok, interesting. Do you have access to your logs to see what the error was? We were able to reproduce with Athena, and are looking more into this.

quenchua commented 2 years ago

I have noticed this as well. Basically, the query gets successfully stopped as far as Superset is concerned. But, if you check recent queries in the Athena UI, the query is still running.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

ValentinC-BR commented 2 years ago

Was it fixed in the latest release ?

mdeshmu commented 2 years ago

@eschutho I have also observed same behavior with Athena. I was not able to stop one of my Athena queries. On clicking stop button i get error "Failed at stopping query." Timer doesn't stop in SQL Lab image I queried Superset meta database to find out query id. image (1) It stopped when i ran this SQL on Superset backend database. UPDATE query SET status = 'failed' WHERE id = 97;

Here are few logs: 2022-06-21 14:13:32,655:INFO:backoff:Backing off stop_query(...) for 0.3s (superset.exceptions.SupersetCancelQueryException: Could not cancel query) 2022-06-21 14:13:32,948:INFO:backoff:Backing off stop_query(...) for 0.1s (superset.exceptions.SupersetCancelQueryException: Could not cancel query) 2022-06-21 14:13:33,092:INFO:backoff:Backing off stop_query(...) for 0.6s (superset.exceptions.SupersetCancelQueryException: Could not cancel query) 2022-06-21 14:13:33,668:INFO:backoff:Backing off stop_query(...) for 1.0s (superset.exceptions.SupersetCancelQueryException: Could not cancel query) 2022-06-21 14:13:34,649:ERROR:backoff:Giving up stop_query(...) after 5 tries (superset.exceptions.SupersetCancelQueryException: Could not cancel query) x.x.x.x - - [21/Jun/2022:14:13:34 +0000] "POST /superset/stop_query/ HTTP/1.1" 422 35 "https://datainsights-dev.com/superset/sqllab/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36"

eschutho commented 2 years ago

Cancel query has only been implemented for mysql, postgres, and snowflake. Thus the error that you're seeing on Athena. It's a fairly simple integration needed where you would run that sql statement that you ran directly above. Here's an example for Mysql if someone is interested in contributing this fix: https://github.com/preset-io/superset/blob/02032ee8a478ab9578cf6e13a7645a8b4106be58/superset/db_engine_specs/mysql.py#L240

mdeshmu commented 2 years ago

@eschutho I ran query against Superset metadata databases's query table. I believe we will have to run the cancel query against Athena to stop the actual query execution. As per my understanding, Cancel query implementations for mysql, postgres, and snowflake run cancellation against respective database engines and not against Superset's metatdata database. Cancelling the Athena query is done via stop_query_execution which requires QueryExecutionId. PyAthena has a cancel method for this purpose. We will have to store QueryExecutionId somewhere in Superset's metadata database so that it can be fetched later via get_cancel_query_id. I have raised this on pyathena github project to check if there is an easy way to QueryExecutionId.

eschutho commented 2 years ago

@mdeshmu correct on all points. Ok, let's see if they have any answers for that lookup. Also I found this if it helps: https://github.com/laughingman7743/PyAthena/blob/master/pyathena/cursor.py#L65

mdeshmu commented 2 years ago

@eschutho In case of Athena, we will have to get QueryExecutionId after running sql statements. So this code won't help us since it is run before the query execution. Looking at this reply, I understand that the cursor.execute method blocks until the query execution is finished. So probably we need some kind of multi-threading to capture QueryExecutionId.

eschutho commented 2 years ago

Correct. The cancel query action is executed here so it will run in a separate thread. If you know how to get the QueryExecutionId, we'll need to add to the Athena db_engine_spec these two methods: get_cancel_query_id and cancel_query

eschutho commented 2 years ago

Here's an example of a recent PR to add the cancel functionality to Redshift for reference: https://github.com/apache/superset/pull/16326

mdeshmu commented 2 years ago

@eschutho What i meant is, To capture QueryExecutionId, we would need a separate thread utilizing the same cursor that is used for running sql statements from SQL Lab but current location of invocation of get_cancel_query_id wouldn't give us QueryExecutionId because its invoked before execution of SQL statements.

Even if we move this after execution of SQL statements then also its of no use as cursor would be blocked until the query execution is finished and get_cancel_query_id would be called after query execution has finished, hence of no use.

Please correct me if I am wrong.

eschutho commented 2 years ago

So this is the abbreviated code for the whole flow of running a query to stopping it. It should work with Athena as long as we build out the two required methods cancel_query and get_cancel_query_id for Athena in the db engine spec:

with closing(engine.raw_connection()) as conn:
        # get cursor
        cursor = conn.cursor() 

        # fetch cancel query id from  cursor
        cancel_query_id = db_engine_spec.get_cancel_query_id(cursor, query)

       # save cursor in metadata on query model
         if cancel_query_id is not None:
            query.set_extra_json_key(cancel_query_key, cancel_query_id)
            session.commit()

       ...
       # execute statement (thread is blocked)
                result_set = execute_sql_statement(
                    statement,
                    query,
                    session,
                    cursor,
                    log_params,
                    apply_ctas,
                )

(separate thread after clicking the "STOP" button)

    @expose("/stop_query/", methods=["POST"])
    ...
    # fetch cancel_query_key from query model metadata
    cancel_query_id = query.extra.get(cancel_query_key)

    .....
    # pass the cancel_query_id to the db_engine_spec and cancel the query
            return query.database.db_engine_spec.cancel_query(
                cursor, query, cancel_query_id
            )
mdeshmu commented 2 years ago

@eschutho In Athena, query id is not available immediately after initiation of cursor.

Athena

with closing(engine.raw_connection()) as conn:
        # get cursor
        cursor = conn.cursor() 

        # fetch cancel query id from  cursor
        cancel_query_id = db_engine_spec.get_cancel_query_id(cursor, query)  --> THIS WONT RETURN ANYTHING in case of Athena

       # save cursor in metadata on query model
         if cancel_query_id is not None:
            query.set_extra_json_key(cancel_query_key, cancel_query_id)
            session.commit()

       ...
       # execute statement (thread is blocked)
                result_set = execute_sql_statement(
                    statement,
                    query,
                    session,
                    cursor,
                    log_params,
                    apply_ctas,
                )                               --> WE WILL NEED A PARALLEL THREAD HERE TO call get_cancel_query_id here.
jplanckeel commented 2 years ago

In pyAthena in version 1.11 we can cancel a query in cursor. Actually superset use pyAthena 1.10

mdeshmu commented 2 years ago

@jplanckeel Superset works with latest PyAthena version. It seems you are referring to query cancellation on KeyboardInterrupt. Can you explain how it can be applied in this case?

eschutho commented 2 years ago

Thanks for starting a PR @jplanckeel. We'll review or give feedback when ready.

rusackas commented 9 months ago

Hey all... I know that some DBs made progress in this area, but I'm not sure about Athena. Since the PR is not moving forward, and this thread has been silent for about a year and a half, I wondered if y'all think we should keep this open, or let it go. In particular, since it's arguably a feature request as much as a bug ¯\_(ツ)_/¯

rumbin commented 9 months ago

I'm not directly affected any longer, as we moved to Snowflake. But I can confirm that this issue is still present in Superset 3.0.3.

I would definitely not call it a feature, since users expect the STOP button in SQLLab to work. If Superset does not provide this functionality, the STOP button should at least be invisible or grayed out.

ajunior commented 9 months ago

Hey folks! Any news about this issue? It's still happening here with Athena. I'm manually updating the query status to failed as a workaround, as mentioned by @mdeshmu almost two years ago.

rusackas commented 9 months ago

@ajunior if you (or anyone reading this) want to pick up the abandoned PR, you may be able to fix it. We'd appreciate the contribution!

vyasmanmohan commented 8 months ago

Possible to assign me? I will have a look. @rusackas

rusackas commented 8 months ago

@vyasmanmohan thanks for stepping up! I've assigned the issue to you, though I can't assign the PR. If you want me to assign that, too, you need to comment on that thread as well.

rusackas commented 4 months ago

Any word on this @vyasmanmohan ? If nobody on the thread is working on this, we might close it out as stale/abandoned to maintain a more actionable backlog.