awslabs / data-solutions-framework-on-aws

An open-source framework that simplifies implementation of data solutions.
https://awslabs.github.io/data-solutions-framework-on-aws/
Apache License 2.0
116 stars 21 forks source link

Add option to DataLakeCatalog/DataCatalogDatabase that crawlers use manually defined table #490

Open karnik opened 8 months ago

karnik commented 8 months ago

DataLakeCatalog/DataCatalogDatabase should have the option of manually setting the tables for the crawler as parameters. There are several use cases that require a manually created catalog table.

See: https://docs.aws.amazon.com/glue/latest/dg/tables-described.html#update-manual-tables

vgkowski commented 8 months ago

Good point! My suggestion is we add a crawler mode parameter with 3 different values:

karnik commented 8 months ago

Sounds great. How do you want to pass the table definition?

vgkowski commented 7 months ago

Do you want to be able to filter tables by S3 prefix?