doublecloud / transfer

Open Source Cloud Native Ingestion engine
https://double.cloud/services/doublecloud-transfer/
Apache License 2.0
31 stars 5 forks source link

Describe transformer command #105

Closed laskoviymishka closed 1 week ago

laskoviymishka commented 1 week ago

Add new cli command: trcli describe transformer --type sql

This command describe how to use specific transformer + list all available transformers:

./binaries/main describe transformer --type sql

  ## Clickhouse transformer

  Based on Clickhouse Local
  https://clickhouse.com/docs/en/operations/utilities/clickhouse-local/

  This tool accept clickhouse SQL dialect and allow to produce SQL-like in memory
  data transformation.

  Image: diagram → https://jing.yandex-team.ru/files/tserakhau/ch_local_trans.svg

  Source table inside CH Local named as  table , clickhouse table structure mimic
  source table structure.

  Since each source change item (row) contains extra metadata we must match source
  and target data together. There for each row must have a key defined. All of
  this key should be uniq in every batch (for this we call collapse function). If
  we can't match source keys with transformed data we will mark such row as
  errored.

   Example Config

    tables:
        include_tables:
        - '"public"."included_data"'
        exclude_tables:
        - '"public"."excluded_data"'
    query: |
        select
                *
        from table

To make configs more user friendly query become a yaml filed, instead of json-baked string field.

laskoviymishka commented 1 week ago

:shipit:

robot-magpie[bot] commented 1 week ago

@laskoviymishka has imported your pull request. If you are a Yandex employee, you can view this diff.

robot-magpie[bot] commented 1 week ago

✅ This pull request is being closed because it has been successfully merged into our internal monorepository. Your changes will be pushed to this repository soon. Thank you for your contribution!