IntelPython / scikit-learn_bench

scikit-learn_bench benchmarks various implementations of machine learning algorithms across data analytics frameworks. It currently support the scikit-learn, DAAL4PY, cuML, and XGBoost frameworks for commonly used machine learning algorithms.
Apache License 2.0
111 stars 69 forks source link

Refactoring of benchmarks #133

Closed Alexsandruss closed 2 months ago

Alexsandruss commented 1 year ago

Benchmarks rework

Entry points: README, Developer Guide

Key features:

Solved issues

samir-nasibli commented 1 year ago

@Alexsandruss Thank you for your hard work here!

I think this pull request is getting quite large and includes multiple features. It contains too much code to be reviewed. It's useful to have a way to split it into smaller tasks for your reviewers so that they can focus on specific portions of it. I am pretty sure that we already can merge some of features here into master, if we would have separate PRs for them.

Wouldn't it be better to move an already finished feature to a separate PR? In this case we can quickly and efficiently review and merge it.

napetrov commented 1 year ago

This CODEOWNERS file contains errors

icfaust commented 1 year ago

So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export

SSL_CERT_DIR=/etc/ssl/certs

To get it to properly download. Somehow this wasnt a problem with 3.10 ¯\(ツ)

icfaust commented 1 year ago

Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: https://github.com/scikit-learn/scikit-learn/commit/f473d7e5bb4aaaf2bb292aa7b0c6195b470daf39

But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't

Alexsandruss commented 1 year ago

So far I am also running into an issue downloading datasets in 3.9 with urllib, specifically with the SSL certs. To get it to download I had to export

SSL_CERT_DIR=/etc/ssl/certs

To get it to properly download. Somehow this wasnt a problem with 3.10 ¯**(ツ)**/¯

It looks like problem of specific python environment. Python 3.9 from conda-forge works fine.

Alexsandruss commented 1 year ago

Also, some of the errors seen in the PR associated with KNN/sklearn in the CI checks have been solved with this PR: scikit-learn/scikit-learn@f473d7e

But likely a handling of Errors needs to occur to make sure that it will run through even though part of sklearn doesn't

There is no external fix for this sklearn version, so previous version is pinned to dependencies from now.

icfaust commented 1 year ago

Naive question: could we catch Errors in such a way to allow for reports to be generated even in the case of a failure in a certain test case? Something like how its done in pytest, or would that be out of scope of this PR.

icfaust commented 11 months ago

This really really needs a usage guide and docs. Barrier to entry for testing/evaluating this PR is high.

samir-nasibli commented 10 months ago

@Alexsandruss Please add TODO list into the description.

ethanglaser commented 9 months ago

Is there a sample ml_benchmarks CI job ran with this branch? Or an infra branch associated with the update? I imagine some refactoring of infra support would be needed.

ethanglaser commented 9 months ago

Are any TODOs from CodeFactor hits meant to be addressed in this PR? Or would they be addressed in subsequent PRs?

ethanglaser commented 9 months ago

Is there a sample ml_benchmarks CI job ran with this branch? Or an infra branch associated with the update? I imagine some refactoring of infra support would be needed.

FYI this is in progress but functional - infra PR #684

ethanglaser commented 8 months ago

@Alexsandruss is there additional work you are planning to do on this or is it fully ready for review/merge at this point?

samir-nasibli commented 5 months ago

@Alexsandruss could you please resolve conflicts for report_generator/default_report_gen_config.json

icfaust commented 3 months ago

@Alexsandruss After discussions with others, for this PR, just limit everything to 3.10 so as to be able to merge it ASAP Please also change things related to private CI related to the python versions (so some of my previous comments can be disregarded short-term).