The DataHelix generator allows you to quickly create data, based on a JSON profile that defines fields and the relationships between them, for the purpose of testing and validation
Incorporate the visualizer into all parts of the generation pipeline
Description of Problem:
At the moment we can visualize the initial decision tree - but need to test this still works too. We want to be able to visualize the tree and other relevant data structures throughout the whole process.
First task is to check the initial visualization of the tree still works and to flesh this task to detail all parts want to visualize and how often we do it.
Analysis Tasks
(Plan is to do each of them under this same ticket but probably as separate PRs)
[x] check visualization of tree still works and add results of what it does ot thsi issue
[x] decide where to add visualization in the pipeline and add this information to this issue
[x] design how can add these visualizations to parts of pipeline chosen in clean way and add this information to this issue
[x] expand task list to include a tasks for implementing the framework to use and to implement each individual part need to do so added to all relevant parts of the pipeline
Analysis Results
what visualisation currently does
It runs its own VisualizExecute command that is completely separate form what GeneratorExecute does. It read in profile, validates tree and then calls DecisionTreeVisualisationWriter class to output to a file in dot file format
where to add visualization in the pipeline
Initially add it to all stages (ie at start, after partitioning, after optimising) up to the walker
Later add it to the walker so it can (maybe if another flag is set) print it after every re-prune it does within the walker
how can add these visualizations to parts of pipeline chosen in clean way
Want to get rid of visualizeExecute and add this flag into generate mode as optional feature.
Up to the walker stage it remains a decision tree so can hopefully re-use existing DecisionTreeVisualisationWriter - although may need to make it be able to append to a file
When get to walker stage may need to decorate the pruner in some way when visualization of walking stage is on. This deocrator could print the tree after every prune
tasks created
see below
Implementation Task List (which will probably create a few PRs for)
[x] Convert existing implementation so done within GenerateExecute rather than have a separate execute command - need to work out cleanest way to inject the writer (eg do we pass in a dummy one if not using visualization and always call it). Make sure print initial and up fornt pruned tree so it means consider the fact that going to need to write to multiple files (as graph viz doe snot support multiple graphs in the same dot file)
[x] Delete "visualise" command now that its an option for normal generate command. At same time update documentation about how to use visualiser the new way
[ ] Improve so now calls the writer every time tree changes before walker called - this includes the multiple trees created after partitioning done (and includes post pruning and post optimisation). Update any documentation on visualiser
[ ] Add visualization to walker - may need to decorate the pruner in some way when visualization of walking stage is on. This decorator could print the tree after every prune. Update any documentation on visualiser
[ ] Add a good profile to examples to show of the various stages of the visualizer - maybe try and do most of stuff mentioned in #1216 in the example do. Possibly update any documentation on visualiser so references this example
Feature Request
Incorporate the visualizer into all parts of the generation pipeline
Description of Problem:
At the moment we can visualize the initial decision tree - but need to test this still works too. We want to be able to visualize the tree and other relevant data structures throughout the whole process.
First task is to check the initial visualization of the tree still works and to flesh this task to detail all parts want to visualize and how often we do it.
Analysis Tasks
(Plan is to do each of them under this same ticket but probably as separate PRs)
Analysis Results
what visualisation currently does
where to add visualization in the pipeline
how can add these visualizations to parts of pipeline chosen in clean way
tasks created
see below
Implementation Task List (which will probably create a few PRs for)