GoogleCloudPlatform / datacatalog-connectors-rdbms

Sample code with integration between Data Catalog and RDBMS data sources.
Apache License 2.0
72 stars 50 forks source link

[FEATURE] Override resource URL #49

Closed dwilliams782 closed 3 years ago

dwilliams782 commented 4 years ago

What would you like to be added:

An optional way to set the resource URL rather than using the connection string.

Why is this needed:

I have used the connector in two ways so far:

  1. Within a k8s cluster, connecting via pgbouncer
  2. From a local docker container, using port forwarding to a cloud sql proxy.

With either, the resource URL on the created table resource in Data Catalog is set to the connection string, which is either the pgbouncer service or localhost.

image

This isn't reflective of where the data is stored. It would be useful to be able to specify that the resource URL is (for example) project-name/db-instance.

mesmacosta commented 4 years ago

Hi @dwilliams782,

Thanks for opening this, is it possible to build this from the existing parameters we already receive, or for your use case, it would be better to have a new parameter like entry_resource_url ?

dwilliams782 commented 4 years ago

I think this will need to be a new parameter.

One thing I had considered would be to add entry_resource_url_prefix which I would pass as <projectID>/<db-instance> and then it could have the database and table appended?

So for each table resource, the resource URL would be project/database-instance/database/table

dwilliams782 commented 4 years ago

If it used just database/table as it is now (which is information it has) it would still be a more useful value in resource URL IMO.

mesmacosta commented 4 years ago

I think entry_resource_url_prefix looks good, since you can ingest from different sources databases. And as you pointed out sometimes you don't have control over the hostname, which isn't reflective of where the data is stored.

So this new parameter entry_resource_url_prefix would override the hostname when informed:

[entry_resource_url_prefix]/database/table

or when empty
[rbms_host]/database/table

WDYT?

mesmacosta commented 4 years ago

I will work on this on the new RDBMs version.

mesmacosta commented 4 years ago

@ricardolsmendes any thoughts?

ricardolsmendes commented 4 years ago

@ricardolsmendes any thoughts?

Hi @mesmacosta, I like the proposed solution. It's indeed important to keep the new parameter optional as you suggested. It avoids breaking changes and adding extra complexity to the CLI, especially for new users. 👍🏼

mesmacosta commented 3 years ago

Reopened. I will close this after the connector is published to PyPI with the new version.

mesmacosta commented 3 years ago

Closing this issue, new version was published to PyPI.