apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.8k stars 13.87k forks source link

fix(dataset-api): get_or_create creates a dataset for an existing table_name but different schema #30379

Open luizcapu opened 1 month ago

luizcapu commented 1 month ago

SUMMARY

At Pinterest we were trying to use the get_or_create endpoint to automate the integration between our MetricsLayer and Superset. During our tests we've encountered the following issue: https://github.com/apache/superset/issues/30377

The issue happens because at the moment the get_or_create code does not account for the payload schema attribute, doing a search by table_name only.

This PR changes get_or_create to take the schema into account:

TESTING INSTRUCTIONS

Case 1 - False Positive

  1. Go to the datasets page
  2. Pick any existing dataset name and prepare a payload as follows (example using the users datasets)
{
  'table_name': 'users',
  'schema': 'other',
  'database_id': 1,
}
  1. Submit this payload via a POST request to /api/v1/dataset/get_or_create
  2. A new dataset is created. The API returns with a 200 pointing to the new dataset. No false positives anymore.

Case 2 - Internal Server Error

  1. Create 2 or more datasets with the same table_name and different schemas (either via UI or create dataset API)
  2. Try to create a new dataset. Again, with same table_name but a different schema. Payload example:
{
  'table_name': 'users',
  'schema': 'any_new_schema_name',
  'database_id': 1,
}
  1. Submit this payload via a POST request to /api/v1/dataset/get_or_create
  2. A new dataset is created. The API returns with a 200 pointing to the new dataset. No 500 errors anymore.

Backward Compatibility

  1. Go to the datasets page
  2. Pick any existing dataset name and prepare a payload as follows (without passing the schema)
{
  'table_name': 'users',
  'database_id': 1,
}
  1. Submit this payload via a POST request to /api/v1/dataset/get_or_create
  2. No new dataset is created. The API returns with a 200 with the response body pointing to the existing dataset.

ADDITIONAL INFORMATION

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 83.91%. Comparing base (76d897e) to head (ed26eb6). Report is 835 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #30379 +/- ## =========================================== + Coverage 60.48% 83.91% +23.42% =========================================== Files 1931 533 -1398 Lines 76236 38627 -37609 Branches 8568 0 -8568 =========================================== - Hits 46114 32412 -13702 + Misses 28017 6215 -21802 + Partials 2105 0 -2105 ``` | [Flag](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | | |---|---|---| | [hive](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `48.95% <40.00%> (-0.22%)` | :arrow_down: | | [javascript](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `?` | | | [mysql](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `76.78% <100.00%> (?)` | | | [postgres](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `76.85% <100.00%> (?)` | | | [presto](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `53.44% <40.00%> (-0.37%)` | :arrow_down: | | [python](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `83.91% <100.00%> (+20.42%)` | :arrow_up: | | [unit](https://app.codecov.io/gh/apache/superset/pull/30379/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | `60.75% <40.00%> (+3.12%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

luizcapu commented 1 month ago

Marking this PR as ready for review. test-sqlite is the only test failing.

However, it's failing with sqlalchemy.exc.IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: tables.table_name, which is inherently opposed to the nature of this PR.

Furthermore, there are some signals that this constraint should be dropped:

I could use some guidance to decide whether or not dropping this constraint is the right decision. And if so, how to do it.

Thank you in advance.