daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

Primitive for multinomial logistic regression #869

Open pdamme opened 1 month ago

pdamme commented 1 month ago

We would like to have a reusable implementation of multinomial logistic regression (training and prediction), such that users can easily employ it in their DaphneDSL scripts.

Steps:

  1. Translate the DML built-in functions multiLogReg() and multiLogRegPredict() from Apache SystemDS to DaphneDSL. The translation can be done semi-automatically using DAPHNE's dml2daph tool (currently not on main, but a usable version of the Python script can be found in PR #576).
  2. The translated code should be made available to DAPHNE users as importable DaphneDSL scripts in scripts/algorithms/. Inspiration on the format can be taken from scripts/algorithms/decisionTree_.daph.
  3. Furthermore, script-level test cases should be added that test if the resulting multiLogReg implementation yields meaningful results. Inspiration for the test cases could be taken from the test cases in Apache SystemDS as well, as we did for the decision tree test cases in test/api/cli/algorithms/DecisionTreeRandomForestTest.cpp.

Hints:


Side note: MultiLogReg is particularly interesting to us at the moment because it is required for SystemDS's clustered classification script, which we want to be able to run soon.