dremio-professional-services / dremio-cloner

27 stars 20 forks source link

Improved PR reviews by splitting up virtual datasets in a metadata file and a SQL file #41

Closed jeff-99 closed 10 months ago

jeff-99 commented 12 months ago

We use dremio-cloner to add all our VDS definitions under source control. So we can develop on a dev/test environment. Do peer reviews and finally sync the repository to the production environment.

In this workflow it's really hard to do PR reviews on VDS changes. As the most important change is the 'SQL' query and in the repository that query is hidden in a JSON file as a single line string. It's near impossible to review those changes.

You would need to go to the production environment web-ui. Copy the SQL code into a file. Copy the version from test. Do a diff to see the changes. If it looks ok you can then execute the new query to validate the results.

By splitting up the SQL from the JSON file you get the opportunity to have meaning full diffs in PR's and with the right kind of tools you could even execute the VDS query directly from your code editor to do the testing.

mxmarg commented 10 months ago

Hi Jeff, thank you for contributing this PR. I think you make very good arguments about separating SQL and metadata as a Dremio Cloner feature. My main concern revolves around the breaking impact this would have on existing Cloner CI/CD pipelines. However, if you can lock your proposed behaviour behind an optional config flag separate_sql_and_metadata_files in the config read JSON, I would be willing to merge it. E.g.:

  {"target": [
    {"directory":"<TARGET_DIRECTORY_PATH>"},
    {"separate_sql_and_metadata_files": "False"},
    {"overwrite": "False"}]
  }
jeff-99 commented 10 months ago

@mxmarg I've 'hidden' the feature behind the feature flag and updated the example configs and README.