For different levels of user, we should have a variety of options for the configuration files...
For advanced users, we already have comments/instructions in the Config files themselves with instructions on how to write them.
These can always be expanded.
We discussed having explanation ReadMe.md files with more detail to guide Config writing.
For others, we have a lot of options (requiring various amounts of work)
1) "Kitchen Sink" option based on Data (for Tform Config):
Load up the user's data, inspect the columns, and generate ALL transformation commands that would work with their data set. (For instance, cannot Log() if has negative data...)
Then the user can manually delete any that they don't want to try.
2) "Runtime Limited" version of Kitchen sink could be implemented, where the user specifies a run time (4 hours, for example), and the loop will continue until that time has elapsed, with however many transformations are done in that time.
2b) Similar Manual-ish version could make a kitchen-sink-like estimate but simply cut off the list after a certain number of iterations.
3) "Priority by Inspection" : We could use a simple (existing) linear-correlation algorythm (Numpy has some right?) to do an early correlation between each column and Linear, Log, Exp, Power-series, etc. data, and then put THOSE transformations as the first to try. This is pretty "A-Priori" so maybe unfair, but it might be the quickest path to good results!
4) "Priority by Meta-Learning" : This might be the best End-Goal... Start out with random transformations, then iterate the loop until the program "LEARNS" which transformation/column pairs are associated with the best models. Eventually, it will find the best ones to use on its own!?!?!
For different levels of user, we should have a variety of options for the configuration files...
For others, we have a lot of options (requiring various amounts of work) 1) "Kitchen Sink" option based on Data (for Tform Config):
Then the user can manually delete any that they don't want to try.
2) "Runtime Limited" version of Kitchen sink could be implemented, where the user specifies a run time (4 hours, for example), and the loop will continue until that time has elapsed, with however many transformations are done in that time.
2b) Similar Manual-ish version could make a kitchen-sink-like estimate but simply cut off the list after a certain number of iterations.
3) "Priority by Inspection" : We could use a simple (existing) linear-correlation algorythm (Numpy has some right?) to do an early correlation between each column and Linear, Log, Exp, Power-series, etc. data, and then put THOSE transformations as the first to try. This is pretty "A-Priori" so maybe unfair, but it might be the quickest path to good results!
4) "Priority by Meta-Learning" : This might be the best End-Goal... Start out with random transformations, then iterate the loop until the program "LEARNS" which transformation/column pairs are associated with the best models. Eventually, it will find the best ones to use on its own!?!?!