Closed PeterPann23 closed 5 years ago
It's not generally meant to be machine parsable. What are you looking to parse from it?
The most important lines begin with a pipe "|", as the lines are meant for an ascii art table w/ a border. This gives you one line per iteration in the sweep, reporting the trainer and metrics.
(base) MacOs:~ justinormont$ grep "|" /private/tmp/blah/Demo/logs/debug_log.txt
| Trainer MicroAccuracy MacroAccuracy Duration #Iteration |
|1 AveragedPerceptronOva 0.9672 0.9695 56.5 0 |
|2 SdcaMaximumEntropyMulti 0.9590 0.9632 56.6 0 |
|3 LightGbmMulti 0.9754 0.9729 126.7 0 |
|4 SymbolicSgdLogisticRegressionOva 0.9344 0.9341 57.2 0 |
|5 FastTreeOva 0.9918 0.9900 49.3 0 |
|6 LinearSvmOva 0.9508 0.9504 53.2 0 |
|7 LbfgsLogisticRegressionOva 0.9754 0.9700 54.4 0 |
|8 SgdCalibratedOva 0.9672 0.9629 53.9 0 |
|9 FastForestOva 0.9836 0.9838 41.9 0 |
|10 LbfgsMaximumEntropyMulti 0.9754 0.9700 54.8 0 |
|11 FastTreeOva 1.0000 1.0000 77.7 0 |
| Summary |
|ML Task: multiclass-classification |
|Dataset: Demo.TRAIN.tsv |
|Label : Label |
|Total experiment time : 904.33 Secs |
|Total number of models explored: 11 |
| Top 5 models explored |
| Trainer MicroAccuracy MacroAccuracy Duration #Iteration |
|1 FastTreeOva 1.0000 1.0000 77.7 11 |
|2 FastTreeOva 0.9918 0.9900 49.3 5 |
|3 FastForestOva 0.9836 0.9838 41.9 9 |
|4 LightGbmMulti 0.9754 0.9729 126.7 3 |
|5 LbfgsLogisticRegressionOva 0.9754 0.9700 54.4 7 |
(base) MacOs:~ justinormont$
There are also MAML-ish lines printed, which give you a shorthand form of the pipeline created. These lines include the hyperparameters for the models.
If you're using the AutoML API (not the CLI or ModelBuilder), I would attach to the progressHandler
to log these and not have to do log scraping.
OK, so I could get the models it tried with the parameters it used like that? I noted that by taking a "winner" and altering it I can try parameter's it did not try like unbalanced data set and improve it further. Is there a sample that one can look at to capture the logs with?
Yes, the progressHandler
returns the model and its metrics after each iteration.
AutoML should perhaps be sweeping over LightGBM's unbalanced sets hyperparameter when the dataset's skew is found to be high in the dataset statistics step. Feel free to put in a PR if you're up for such a task.
For implementing a progressHandler
, there's an example for each task in the samples repo:
How does one get to the HyperParameters used? is there a way to get the options?
The hyperparameters used in the models are not exposed publicly. They are in the callback, but not public.
You're looking for the Pipeline
within the returned RunDetail
which is sent to your progressHandler
callback. Though you can see them in the debugger, or use reflection. This of course can break in the future.
It would be really helpfull to het the hyperparameters to manual tune the model. How does AutoML CLI print the hyperparameters to the log? is there a way to get them so one can improve on that? I know it generates the code for the winner but I might not always agree with who the winner is.
OK, needed to implement my ETL, not sure if you would like to share this after "cleaning it up" but I reverse engineered it and does what I need it to do.
Like any ETL it may not survive the next update of the file generating the output however works for now and we can process the dataset and have ml.net cli "suggest" some pipelines.
It's not the fastest most elegant but parses my directories in under a second on my pc and gives me a json dataset that I can use to inject defaults with like this.
var options = new FastForestBinaryTrainer.Options
{
LabelColumnName= "Trend"
,DiskTranspose = true
,NumberOfLeaves = GetOrDefault<FastForestBinaryTrainer.Options>(nameof(FastForestBinaryTrainer.Options.NumberOfLeaves),90)
,MinimumExampleCountPerLeaf = GetOrDefault<FastForestBinaryTrainer.Options>(nameof(FastForestBinaryTrainer.Options.MinimumExampleCountPerLeaf),50)
,NumberOfTrees = GetOrDefault<FastForestBinaryTrainer.Options>(nameof(FastForestBinaryTrainer.Options.NumberOfTrees),100)
,GainConfidenceLevel = GetOrDefault<FastForestBinaryTrainer.Options>(nameof(FastForestBinaryTrainer.Options.GainConfidenceLevel),0.7 )
};
Could be I miss a few entries but here is the output: data.zip Here is the code logparser.zip
Feel free to close the issue
Nice parser. As you say, a stronger API would make this process easier. It would take some thought on how to make a clean API to expose the hyperparameter values.
Perhaps could expose an options object for each trainer. Though this will run into a typeness issue.
We're also looking to alter many more pipeline parameters per sweep iteration, which would make this far more complex to expose the full pipeline with options in a clean manner.
Closing as requested.
What is the logic behind the debug_log.txt, if one would like to parse it, how would one go at it. it's not documented as far as I can tell, I can find things in it but I would like to load it using C# into a structured format.
I have not seem to have found a clever way to do this. can some one point me at the right direction?