Implementation of MultiClass SVM

FAmirjani commented 1 month ago

Support Vector Machines (SVM) is a supervised machine learning algorithm for classification and regression tasks. SVM works by finding the optimal hyperplane that separates data points of different classes in a high-dimensional space. The goal is to maximize the margin, which is the distance between the closest data points (called support vectors) from each class and the separating hyperplane. In cases where the data is not linearly separable, SVM uses kernel functions (e.g., polynomial or radial basis function) to map the data into a higher-dimensional space where a linear separation is possible. By solving an optimization problem, SVM ensures that the best hyperplane is chosen, providing robust classification with good generalization to unseen data. Requirements for implementation:

Nonlinear SVM: Implementing kernel functions to map data to higher dimensions.
Quadratic Programming (QP): The optimization problem SVM needs to solve is a convex QP problem. DAPHNE will need an efficient QP solver for this step.
Linear Algebra Libraries: DAPHNE likely integrates with linear algebra libraries optimized for its backend. It could be leveraged for matrix multiplications, dot products, and more.
Sparse Data Support: DAPHNE is designed for high-performance scenarios, so handling sparse datasets efficiently will be key to ensuring that the SVM scales to large datasets.
DAPHNE handles large-scale matrix and tensor operations. DAPHNE's optimized routines can be used to handle matrix multiplications, transpositions, and dot products.
SVM also relies on vector norms and dot products, which can be efficiently implemented using DAPHNE’s underlying
To search for the best parameters like C and gamma, a strategy like Grid Search, Random Search, or Bayesian Optimization, paired with cross-validation is needed to avoid overfitting.

FAmirjani commented 1 month ago

I am going to work on it.

FAmirjani commented 4 weeks ago

I am going to work on it. It is done, but I do not know how to test the code on the DAPHNE platform and add the codes to the DAPHNE repository.

pdamme commented 4 weeks ago

Hi @FAmirjani, thanks for the support! An SVM implementation in DaphneDSL would be very welcome. We collect reusable DaphneDSL scripts that users can import and call as functions in their own DaphneDSL scripts in scripts/algorithms/ (see, e.g., scripts/algorithms/decisionTree_.daph for a recent example).

Yes, it would be great if you could also contribute test cases for the SVM script. The test cases should make sure that your script (1) successfully compiles and executes in DAPHNE (without crashes) and (2) yields meaningful results. General guidelines on testing in DAPHNE can be found in the documentation. For your specific case, I recommend having a look at the test cases for decision trees (test/api/cli/algorithms/DecisionTreeRandomForestTest.cpp) as a recent example. These execute small DaphneDSL scripts (e.g., test/api/cli/algorithms/decisionTreeRealData2.daphne) on a small real-world data set in test/data/wine/ and verify if the resulting accuracy is satisfactory. In case you ported the SVM script from Apache SystemDS, you could also port their test cases to DAPHNE. Don't hesitate to reach out if you need additional hints.

Once your contribution is ready, you can create a pull request (see our contributions guidelines).

daphne-eu / daphne

Implementation of MultiClass SVM #851