apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.07k stars 13.61k forks source link

Creating data sources on the HTTP API from the command line, using HTTPie #20546

Open amotl opened 2 years ago

amotl commented 2 years ago

Dear lovely people of Apache Superset,

first things first: Thanks a stack for conceiving and maintaining Apache Superset. It is truly a gem.

Foreword

This is not meant to be an actual bug report. Maybe you can slap an info label on it, or just tuck it away into the "Discussions" section?

Introduction

I am trying to create a data source using the HTTP API of Apache Superset without adjusting WTF_CSRF_ENABLED = False and I think I took all input from #2488, #4018, #8382, #10354, #16003, #17206, #19343, #19356, and further information referenced below into consideration.

16003 was the most helpful of all resources, outlining how to send both Authorization and X-CSRFToken headers appropriately. However, people are still struggling to replicate this workflow from the command line, for example using curl.

In this post, I would like to demonstrate, that beyond properly sending the corresponding tokens, you will also need to maintain a session between requests. I will use HTTPie for that purpose.

Walkthrough

This is meant to be exercised on a standard vanilla installation of Apache Superset, where the authentication credentials are still admin/admin and no other pieces have been modified. If you adjusted your installation, you will need to modify some bits accordingly.

You will need to install both HTTPie and jq, e.g. by typing {apt,brew,yum} install httpie jq.

# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)

# Acquire a CSRF token.
CSRF_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/csrf_token/ Authorization:"Bearer ${AUTH_TOKEN}" | jq -r .result)

# Create a data source item / database connection.
http --session=superset http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://postgres@host.docker.internal:5432 Authorization:"Bearer ${AUTH_TOKEN}" X-CSRFToken:"${CSRF_TOKEN}"

Enquiry

Somehow, I would have expected that this procedure would also work without needing to maintain a session. However, when running the commands from the example above, and omitting the --session= option, the last command croaks with the venerous

400 Bad Request: The CSRF session token is missing.

Conclusion

So, this post is meant to be both an informational reference for the community how to actually create datasource items using the HTTP API from the commandline, and at the same time an enquiry to the developers, if my expectations, to be able to run a conversation with the API without maintaining a session, are actually inappropriate.

Thank you in advance for taking the time to look into this topic.

With kind regards, Andreas.


Further references

https://stackoverflow.com/questions/66015739/use-apache-superset-api-to-feed-a-dataset https://stackoverflow.com/questions/68614350/cannot-post-a-new-db-to-apache-superset-400-error-with-csrf https://solveforum.com/forums/threads/solved-cannot-post-a-new-db-to-apache-superset-400-error-with-csrf.49375/ https://groups.google.com/g/airbnb_superset/c/3H7SZma4ZEE

stupid-yu commented 2 years ago

hello, I have the same problem when using curl to create database.

[root@superset]# token=$(curl -X 'POST' \
  'http://'${HOSTNAME}':'${PORT}'/api/v1/security/login' \
  -H 'accept: */*' \
  -H 'Content-Type: application/json' \
  -d '{
  "username": "admin",
  "password": "admin",
  "refresh": true,
  "provider": "db"
}')

[root@superset]# function parse_json { echo "${1//\"/}" | sed "s/.*$2:\([^,}]*\).*/\1/" ; }

[root@superset]# csrf=$(curl -X 'GET' 'http://'${HOSTNAME}':'${PORT}'/api/v1/security/csrf_token/' -H 'Authorization: Bearer '$(parse_json $token "access_token")'')

[root@superset]# curl -vvvv -X 'POST' 'http://'${HOSTNAME}':'${PORT}'/api/v1/database/' -H 'Authorization: Bearer '$(parse_json $token "access_en: '$(parse_json $csrf "result")'' -H 'accept: */*' -H 'Content-Type: application/json' -d '{
"database_name": "kyuubi-jdbc",
"sqlalchemy_uri": "hive://bcdp@dwh-htwsxrv9-kyuubi-kyuubi",
"expose_in_sqllab": true,
"allow_ctas": true,
"allow_cvas": true,
"allow_dml": true,
"allow_multi_schema_metadata_fetch": true
}'
* About to connect() to dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf port 58093 (#0)
*   Trying 192.168.11.173...
* Connected to dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf (192.168.11.173) port 58093 (#0)
> POST /api/v1/database/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf:58093
> Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2NTcxODI2MTcsIm5iZiI6MTY1NzE4MjYxNywianRpIjoiMzYxMjA4YmEtMThjZC00MDY0LTgxOTQtNjdiZjI3ZmY1ZjI2IiwiZXhwIjoxNjU3MTgzNTEJlc2giOnRydWUsInR5cGUiOiJhY2Nlc3MifQ.a7sFispKsyUD3FDo47HuuCtq9jP7xpWy3ZaeI1bVpuc
> X-CSRFToken: ImY2ZmUxNDIzNGQ2YTUwYjI2NDg3ZDc0YjRjOGUxZGMwMDAzODA3Zjgi.YsaZsQ.SrP1_NXVfnSZ6uW16V25vPE7yqo
> accept: */*
> Content-Type: application/json
> Content-Length: 222
> 
* upload completely sent off: 222 out of 222 bytes
* HTTP 1.0, assume close after body
< HTTP/1.0 400 BAD REQUEST
< Content-Type: text/html; charset=utf-8
< Content-Length: 150
< Server: Werkzeug/1.0.1 Python/3.7.10
< Date: Thu, 07 Jul 2022 08:33:01 GMT
< 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The CSRF session token is missing.</p>
* Closing connection 0
vishaltps commented 1 year ago

@amotl Its clearly a bug, I have tried to create a guest user token from my rails app and i am keep getting error for CSRF token session is misisng. However, if i am trying from postman it is working fine.

amotl commented 9 months ago

Hi again,

using Superset 2.1.3, on a vanilla installation, I verified that maintaining a session, and supplying a CSRF token, is no longer needed to work with the HTTP API.

# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)

# Create a data source item / database connection.
http http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://postgres@host.docker.internal:5432 Authorization:"Bearer ${AUTH_TOKEN}"

Thanks a stack for improving the situation in this regard.

With kind regards, Andreas.

amotl commented 9 months ago

Hi again. After upgrading to the most recent Superset 3, the problem is back! Cheers, Andreas.

Request

http http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://postgres@host.docker.internal:5432 Authorization:"Bearer ${AUTH_TOKEN}" --print hHbB

Response

{
    "errors": [
        {
            "error_type": "GENERIC_BACKEND_ERROR",
            "extra": {
                "issue_codes": [
                    {
                        "code": 1011,
                        "message": "Issue 1011 - Superset encountered an unexpected error."
                    }
                ]
            },
            "level": "error",
            "message": "400 Bad Request: The CSRF token is missing."
        }
    ]
}
amotl commented 9 months ago

I see. With Superset 3, you need to configure WTF_CSRF_ENABLED = False in superset_config.py. Then, communicating with the HTTP API works without needing to use a corresponding CSRF token. That's fine for my specific purpose, but I am wondering if CSRF protection would be turned off completely then, also on requests from browsers?

thesalmonidae commented 3 months ago

I have this with latest Superset Docker image from the Docker hub.

Please, sort this out, this is ridiculous!