finos / datahelix

The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
https://finos.github.io/datahelix/
Apache License 2.0
141 stars 50 forks source link

Incorporate the visualizer into all parts of the generation pipeline #1322

Open afroggattsl opened 5 years ago

afroggattsl commented 5 years ago

Feature Request

Incorporate the visualizer into all parts of the generation pipeline

Description of Problem:

At the moment we can visualize the initial decision tree - but need to test this still works too. We want to be able to visualize the tree and other relevant data structures throughout the whole process.

First task is to check the initial visualization of the tree still works and to flesh this task to detail all parts want to visualize and how often we do it.

Analysis Tasks

(Plan is to do each of them under this same ticket but probably as separate PRs)

Analysis Results

what visualisation currently does

where to add visualization in the pipeline

  1. Initially add it to all stages (ie at start, after partitioning, after optimising) up to the walker
  2. Later add it to the walker so it can (maybe if another flag is set) print it after every re-prune it does within the walker

how can add these visualizations to parts of pipeline chosen in clean way

  1. Want to get rid of visualizeExecute and add this flag into generate mode as optional feature.
  2. Up to the walker stage it remains a decision tree so can hopefully re-use existing DecisionTreeVisualisationWriter - although may need to make it be able to append to a file
  3. When get to walker stage may need to decorate the pruner in some way when visualization of walking stage is on. This deocrator could print the tree after every prune

tasks created

see below

Implementation Task List (which will probably create a few PRs for)

afroggattsl commented 5 years ago

done first 2 tasks on list in #1418 PR. will do rest of it in another PR