datahub-project / datahub

The Metadata Platform for your Data and AI Stack
https://datahubproject.io
Apache License 2.0
9.89k stars 2.93k forks source link

Issues with the `DataPlatformInstance` v3/entity/dataplatforminstance API methods #11317

Closed usmanovbf closed 1 month ago

usmanovbf commented 2 months ago

Describe the bug
There are several issues with the DataPlatformInstance API provided by DataHub:

  1. HEAD Request Issue: The HEAD /v3/entity/dataplatforminstance/{urn} API returns a 404 Not Found response when checking the existence of a DataPlatformInstance that does exist.
  2. GET Request Generates Incorrect Data: The GET /v3/entity/dataplatforminstance/{urn} API generates fabricated data when a non-existent urn is specified, instead of returning an error.
  3. Empty Response for Scroll/List API: The GET /v3/entity/dataplatforminstance API with scroll or list parameters returns an empty response, despite having existing platform instances in the system.

To Reproduce
Steps to reproduce the behavior:

  1. HEAD Request Issue:

    • Make a HEAD request to https://datahub.example.io/openapi/v3/entity/dataplatforminstance/{urn} with a valid urn such as urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,some-existing-platform-instance).
    • Example code:
    import requests
    
    DATAHUB_API_KEY = '<YOUR_API_KEY>'
    headers = {'Authorization': f'Bearer {DATAHUB_API_KEY}', 'Content-Type': 'application/json'}
    host = "https://datahub.example.io"
    
    urn = 'urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,some-existing-platform-instance)'
    method = '/openapi/v3/entity/dataplatforminstance/{urn}'.format(urn=urn)
    
    r = requests.head(
       url=f"{host}{method}",
       headers=headers,
    )
    
    print(r.status_code)  # Outputs: 404
    • Error: 404 Not Found is returned, but the instance does exist.
  2. GET Request Generates Incorrect Data:

    • Make a GET request to the same endpoint but use a slightly altered urn that does not exist, such as urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,example_instance_wrong).
    • Error: The response mimics a valid structure even though the data does not exist.

    Example incorrect response:

    {"urn":"urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,example_instance_wrong)","aspects":{"dataPlatformInstanceKey":{"value":{"instance":"example_instance_wrong","platform":"urn:li:dataPlatform:mysql"}}}}
  3. Empty Response for Scroll/List API:

    • Make a GET request to https://datahub.example.io/openapi/v3/entity/dataplatforminstance with scroll or list parameters.
    • Error: The response is empty even though there are existing platform instances.

Expected behavior

Screenshots
N/A (No UI involved, API behavior only)

Desktop (please complete the following information):

Additional context

Could you please investigate these issues or provide guidance on resolving them? Any help would be greatly appreciated.

usmanovbf commented 2 months ago

reposted it also in Slack https://datahubspace.slack.com/archives/C029A3M079U/p1725612461659359

david-leifker commented 1 month ago

I believe all of these are working as expected in a later version. I've run through a series of tests and it seems to work as expected which I'll include below.

Note: Some of the behavior that you've pointed out such as returning an empty entity when it doesn't exist and essentially making it very difficult to determine if the the entity actually exists was initially implemented to be compatible with the rest.li endpoints and legacy code. This was re-evaluated at some point after the 0.13.x release and we've decided to make the OpenAPI v3 work in a more logical way (your expected behavior).

Here is an example sequence and the outputs which I believe match all of your expectations.

Test Scenario

For the tests I've exported a few variables.

export URN="urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,some-existing-platform-instance)"
export URN_ENCODED="urn%3Ali%3AdataPlatformInstance%3A%28urn%3Ali%3AdataPlatform%3Amysql%2Csome-existing-platform-instance%29"
export COOKIE="PLAY_SESSION=<...>"

HEAD - Non-existent

In this sequence, the dataPlatformInstance doesn't yet exist. So the HEAD request returns an expected 404.

Request:

curl -v -X 'HEAD' \
  "http://localhost:9002/openapi/v3/entity/dataplatforminstance/${URN_ENCODED}?includeSoftDelete=false" \
  -H 'accept: application/json' \
  -H "Cookie: $COOKIE"

Response:

< HTTP/1.1 404 Not Found
< Date: Fri, 20 Sep 2024 15:03:12 GMT
< Server: Jetty (11.0.21)
< Content-Type: application/octet-stream
< Content-Length: 0

POST - Create entity

Request:

curl -q -X 'POST' \
  'http://localhost:9002/openapi/v3/entity/dataplatforminstance?async=false' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Cookie: $COOKIE" \
  -d '[
  {
    "urn": "'$URN'",
    "status": {
      "value": {
        "removed": false
      }
    }
  }
]' | jq

Response:

[
  {
    "urn": "urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,some-existing-platform-instance)",
    "dataPlatformInstanceKey": {
      "value": {
        "platform": "urn:li:dataPlatform:mysql",
        "instance": "some-existing-platform-instance"
      }
    },
    "status": {
      "value": {
        "removed": false
      }
    }
  }
]

HEAD - Entity exists

Request

curl -v -X 'HEAD' \
  "http://localhost:9002/openapi/v3/entity/dataplatforminstance/${URN_ENCODED}?includeSoftDelete=false" \
  -H 'accept: application/json' \
  -H "Cookie: $COOKIE"

Response

< HTTP/1.1 204 No Content
< Date: Fri, 20 Sep 2024 15:05:43 GMT
< Server: Jetty (11.0.21)
< Content-Type: application/octet-stream

GET - Entity Exists

Request:

curl -q -X 'GET' \
  "http://localhost:9002/openapi/v3/entity/dataplatforminstance/${URN_ENCODED}?systemMetadata=false&aspects=dataPlatformInstanceKey&aspects=dataPlatformInstanceProperties&aspects=ownership&aspects=institutionalMemory&aspects=globalTags&aspects=deprecation&aspects=status" \
  -H 'accept: application/json' \
  -H "Cookie: $COOKIE" | jq

Response:

{
  "urn": "urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,some-existing-platform-instance)",
  "dataPlatformInstanceKey": {
    "value": {
      "platform": "urn:li:dataPlatform:mysql",
      "instance": "some-existing-platform-instance"
    }
  },
  "status": {
    "value": {
      "removed": false
    }
  }
}

GET - Non-existent entity

Request

curl -v -X 'GET' \
  "http://localhost:9002/openapi/v3/entity/dataplatforminstance/${URN_ENCODED}NON_EXISTANT?systemMetadata=false&aspects=dataPlatformInstanceKey&aspects=dataPlatformInstanceProperties&aspects=ownership&aspects=institutionalMemory&aspects=globalTags&aspects=deprecation&aspects=status" \
  -H 'accept: application/json' \
  -H "Cookie: $COOKIE"

Response

< HTTP/1.1 404 Not Found
< Date: Fri, 20 Sep 2024 15:07:20 GMT
< Server: Jetty (11.0.21)
< Not-Found-Reason: ENTITY
< Content-Type: application/octet-stream
< Content-Length: 0

Scroll - One entity actually exists

Request

curl -q -X 'GET' \
  'http://localhost:9002/openapi/v3/entity/dataplatforminstance?systemMetadata=false&includeSoftDelete=true&skipCache=false&aspects=dataPlatformInstanceKey&aspects=dataPlatformInstanceProperties&aspects=ownership&aspects=institutionalMemory&aspects=globalTags&aspects=deprecation&aspects=status&count=10&sort=urn&sortOrder=ASCENDING&query=%2A' \
  -H 'accept: application/json' \
  -H "Cookie: $COOKIE" | jq

Response

{
  "entities": [
    {
      "urn": "urn:li:dataPlatformInstance:(urn:li:dataPlatform:mysql,some-existing-platform-instance)",
      "dataPlatformInstanceKey": {
        "value": {
          "platform": "urn:li:dataPlatform:mysql",
          "instance": "some-existing-platform-instance"
        }
      },
      "status": {
        "value": {
          "removed": false
        }
      }
    }
  ]
}