EducationalTestingService / skll

SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.
http://skll.readthedocs.org
Other
550 stars 69 forks source link

Add W&B logging #761

Closed tamarl08 closed 6 months ago

tamarl08 commented 6 months ago

This PR adds W&B logging for all task types.

Changes include:

Examples:

train: currently only logs the configuration, model file path and train set size (in run summary).

predict: logs configuration, train and test sizes and predictions file

learning curve: logs config, plots and some raw data from the learning curve results (in run summary)

evaluate: logs config and all evaluation metrics and confusion matrix as a chart.

cross_validate: logs evaluation per fold + average of folds

TODO:

desilinguist commented 6 months ago

@tamarl08 any idea why gitlab is failing and Azure is passing?

tamarl08 commented 6 months ago

Will check. I didn't expect anything to pass!

desilinguist commented 6 months ago

I am seeing things like this in my run:

CleanShot 2024-01-12 at 09 36 47@2x

I am also seeing charts with a single data point (e.g., the accuracy values etc.) which aren't useful. I wonder if we can tell W&B to create relevant charts from the summary file on the fly? Let's sit down next week and try to figure out what things are actually valuable to log and how to make a run appear useful right when someone opens it.

tamarl08 commented 6 months ago

This is why I tried to log to summary instead, but I couldn't always get rid of these charts. I'll try some more and let's talk next week.

pep8speaks commented 6 months ago

Hello @tamarl08! Thanks for updating this PR.

Line 452:64: E203 whitespace before ':'

Comment last updated at 2024-01-26 21:47:38 UTC
tamarl08 commented 6 months ago

@desilinguist @mulhod @damien2012eng @Frost45
This is ready for review now. I changed the logging of evaluation/cv tasks so that no unneeded charts are logged.

See the updated evaluation logging here - look only at the most recent run.

Changes to tests are due to: a change I made to the job name/output file names; data added to the result dict; bug fixes in some tests.

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (7c09d07) 95.33% compared to head (1dec165) 95.44%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #761 +/- ## ========================================== + Coverage 95.33% 95.44% +0.11% ========================================== Files 30 30 Lines 3598 3688 +90 ========================================== + Hits 3430 3520 +90 Misses 168 168 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tamarl08 commented 6 months ago

Thanks for the review @desilinguist! All suggestions applied. I did add a section in the docs, and also a comment about the output file names. Will add more examples later.