Introduction
In order to understand where a particular dataset lives, we need to create the concept of a Datasource. Example Datasources may include RedShift, S3, a MySQL instance, etc.
The Datasource object goes beyond just classifying the type of the data store, and also provides connection information about where the data lives. Properties should include a DataSource's name and a connection url.
Using DataSources with Datasets
Every dataset has a datastore in which it lives, and this relationship is expressed on datasets.datasourceUUID.
Access patterns
We have identified a few access patterns for a user to more information about a datastore:
By datastore name
By datastore name and type
Get by URN
API Endpoints
GET /api/v1/datasources -- list all datasources
GET /api/v1/datasources?urn=
POST /api/v1/datasources -- create a datasource
URN formaturn:<type>:<name>
Ex:
:redshift:staging-dw:
where the name="staging-dw" and type="redshift"
Valid Types
There will be a whitelist implemented at the application layer. Potential first set of supported types are:
redshift, mysql, postgresql, snowflake
Field Details
“Name”: string : This is a human-generated name for the datastore. It is required to be unique in the table.
“connectionUrl”: string: The string should have the format “protocol://host:port/database”
Constraints
<Name, Type> should be unique
Case-Sensitivity
Datasource URNs must be specified in lower-case.
Introduction In order to understand where a particular dataset lives, we need to create the concept of a Datasource. Example Datasources may include RedShift, S3, a MySQL instance, etc.
The Datasource object goes beyond just classifying the type of the data store, and also provides connection information about where the data lives. Properties should include a DataSource's name and a connection url.
Using DataSources with Datasets Every dataset has a datastore in which it lives, and this relationship is expressed on
datasets.datasourceUUID
.Access patterns We have identified a few access patterns for a user to more information about a datastore:
API Endpoints GET /api/v1/datasources -- list all datasources GET /api/v1/datasources?urn=
POST /api/v1/datasources -- create a datasource
URN format
urn:<type>:<name>
Ex::redshift:staging-dw:
where the name="staging-dw" and type="redshift"Valid Types There will be a whitelist implemented at the application layer. Potential first set of supported types are:
redshift, mysql, postgresql, snowflake
Examples [POST] Request Payload for Creation
[POST, GET] Response Payload:
Field Details “Name”: string : This is a human-generated name for the datastore. It is required to be unique in the table. “connectionUrl”: string: The string should have the format “protocol://host:port/database”
Constraints
Case-Sensitivity Datasource URNs must be specified in lower-case.