Open luizcapu opened 2 days ago
Hey @luizcapu! đź‘‹ I'm here to help with bugs, answer questions, and assist you while you wait for a human maintainer. Let's squash those bugs together!
The get_or_create
Dataset API endpoint returns a 500 - Internal Server Error
when trying to create a dataset with the same table_name
but a different schema
because the current implementation only checks for the existence of a table by its table_name
and database_id
. It does not account for differences in the schema
. If a table with the same table_name
already exists, it returns the existing table's ID without considering the schema, leading to potential conflicts or errors [1].
The relevant code snippet is:
@expose("/get_or_create/", methods=("POST",))
@protect()
@safe
@statsd_metrics
@event_logger.log_this_with_context(
action=lambda self, *args, **kwargs: f"{self.__class__.__name__}"
f".get_or_create_dataset",
log_to_statsd=False,
)
def get_or_create_dataset(self) -> Response:
"""Retrieve a dataset by name, or create it if it does not exist.
---
post:
summary: Retrieve a table by name, or create it if it does not exist
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/GetOrCreateDatasetSchema'
responses:
200:
description: The ID of the table
content:
application/json:
schema:
type: object
properties:
result:
type: object
properties:
table_id:
type: integer
400:
$ref: '#/components/responses/400'
401:
$ref: '#/components/responses/401'
422:
$ref: '#/components/responses/422'
500:
$ref: '#/components/responses/500'
"""
try:
body = GetOrCreateDatasetSchema().load(request.json)
except ValidationError as ex:
return self.response(400, message=ex.messages)
table_name = body["table_name"]
database_id = body["database_id"]
if table := DatasetDAO.get_table_by_name(database_id, table_name):
return self.response(200, result={"table_id": table.id})
The code only checks for the existence of a table by table_name
and database_id
:
if table := DatasetDAO.get_table_by_name(database_id, table_name):
return self.response(200, result={"table_id": table.id})
This logic does not consider the schema
, which can lead to a 500 - Internal Server Error
if there is a conflict or inconsistency when trying to create a dataset with the same table_name
but a different schema
[1].
Additionally, the test case test_create_dataset_same_name_different_schema
in the integration tests demonstrates that creating datasets with the same table_name
but different schema
is a valid scenario, which the current implementation does not handle correctly [2].
To continue talking to Dosu, mention @dosu.
Bug description
get_or_create
Dataset API endpoint breaks with500 - Internal Server
error if there is any existing dataset with the sametable_name
, regardless of the datasetschema
.This is not the behaviour we observe on other endpoints, such as a
POST /dataset
from where users are able to create datasets with the same table_name, as long as they belong to either a differentdatabase_id
or adifferent schema
.Furthermore, if there is already an existing dataset for a given
table_name
and a user is trying to create a new dataset with the sametable_name
but different schema, the API will return 200, which is a false positive.How to reproduce the bug
Case 1 - False Positive
users
datasets)200
pointing to the existingpublic.users
dataset ID. The new dataset is not created.Case 2 - Internal Server Error
table_name
and differentschemas
(either via UI or create dataset API)table_name
but a different schema. Payload example:500 - Internal Server Error
. The new dataset is not created.Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist