Closed ReinierKop closed 4 years ago
Hi Reinier,
Thanks for this, you are definitely on the right track. In my view at the moment there are two options for this;
There specifically is an option to put a query in the 'source' (as you point out) to avoid having to write a full domain specific language to suit all transformation use-cases. To avoid rebuilding SQL (again as you mention) the OneOf 'query' or 'item' is possible both at DataObject as well as DataItem level. This makes it possible to put a query in the source entirely or parts as mappings to items and build a transformation this way.
How this is used in code / templating engines provides another layer of options of course, as this can define how things are used in practice also - and your example of using the query in the name would work too this way.
The Related Data Object is an array of objects you can use in the templating engines or upstream code. I use this for lookups and joins a lot - for example here: https://github.com/RoelantVos/Virtual-Data-Warehouse/blob/master/LoadPatterns/loadPatternStagingView.Handlebars.
This would look like:
], "sourceDataObject": { "name": "source.A" }, "targetDataObject": { "name": "landing.A", "dataObjectConnection": { "dataConnectionString": "PersistentStagingDatabase" } }, "relatedDataObject": [ { "name": "psa.A", "dataObjectClassification": [ { "classification": "Lookup", "notes": "Lookup table related to the source-to-target mapping" } ] } ],
I am collecting feedback from various projects into an updated version, which I'll work on next week when I have a week off. I'm creating issues for this now. I will add some more complex examples as well.
Hope this makes sense, and thank you for your feedback!
Kind regards - Roelant
Thanks a lot for the quick response! This definitely helped me create a better understanding of the possibilities within the configuration. I'll keep an eye on the repo and meanwhile start fiddling around with this
Hi Reinier, FYI just in case I made some modifications to the schema and also added an example for the above.
Hi Roelant, thanks a lot! I'll check it out :) If I've any more remarks or questions I'll let you know 👍
Great initiative to have this! I'm looking into metadata specifications to use to power our ETL generation and source to target mappings and I'm happy to follow conventions such as the one you've created.
Most examples in this repo are understandably relatively simple and only speak of field-to-field mappings (e.g.
sourceTable.sourceColumn
mapping totargetTable.targetcolumn
). This will often be enough, but lots of our ETL code (and I'm sure we're not unique in this regard) is much more complex.Some specific examples of functionality I am talking about:
group by
,distinct
, aggregations/subqueries/subselects/joins/ctes.Are we meant to just put the complex query in the
sourceDataObject
as the schema implies here or do other methods of specifying this exist (or are perhaps planned)?Just one example adapted from our ETL code, which I tried to map to the schema below:
Our ETL with a join:
Metadata specification how I understand it right now:
I realize it probably wouldn't be easy nor desirable to extend the schema to support all SQL, but I'm just wondering whether I'm on the right track or not.