daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

Implementation of MultiClass SVM #851

Open FAmirjani opened 1 month ago

FAmirjani commented 1 month ago

Support Vector Machines (SVM) is a supervised machine learning algorithm for classification and regression tasks. SVM works by finding the optimal hyperplane that separates data points of different classes in a high-dimensional space. The goal is to maximize the margin, which is the distance between the closest data points (called support vectors) from each class and the separating hyperplane. In cases where the data is not linearly separable, SVM uses kernel functions (e.g., polynomial or radial basis function) to map the data into a higher-dimensional space where a linear separation is possible. By solving an optimization problem, SVM ensures that the best hyperplane is chosen, providing robust classification with good generalization to unseen data. Requirements for implementation:

FAmirjani commented 1 month ago

I am going to work on it.

FAmirjani commented 4 weeks ago

I am going to work on it. It is done, but I do not know how to test the code on the DAPHNE platform and add the codes to the DAPHNE repository.

pdamme commented 4 weeks ago

Hi @FAmirjani, thanks for the support! An SVM implementation in DaphneDSL would be very welcome. We collect reusable DaphneDSL scripts that users can import and call as functions in their own DaphneDSL scripts in scripts/algorithms/ (see, e.g., scripts/algorithms/decisionTree_.daph for a recent example).

Yes, it would be great if you could also contribute test cases for the SVM script. The test cases should make sure that your script (1) successfully compiles and executes in DAPHNE (without crashes) and (2) yields meaningful results. General guidelines on testing in DAPHNE can be found in the documentation. For your specific case, I recommend having a look at the test cases for decision trees (test/api/cli/algorithms/DecisionTreeRandomForestTest.cpp) as a recent example. These execute small DaphneDSL scripts (e.g., test/api/cli/algorithms/decisionTreeRealData2.daphne) on a small real-world data set in test/data/wine/ and verify if the resulting accuracy is satisfactory. In case you ported the SVM script from Apache SystemDS, you could also port their test cases to DAPHNE. Don't hesitate to reach out if you need additional hints.

Once your contribution is ready, you can create a pull request (see our contributions guidelines).