cjlin1 / libsvm

LIBSVM -- A Library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
BSD 3-Clause "New" or "Revised" License
4.52k stars 1.64k forks source link

Implement tbb-based parallelism; use repeatable random numbers. #186

Closed Roebu closed 11 months ago

Roebu commented 1 year ago

This fork from the LibSVM 3.3 package incorporates the following enhancements:

Implements parallelism in the get_Q() and svc_predict_values() functions. This requires that the tbb (thread building blocks) package be installed ("apt-get install -y libtbb-dev"). Initial performance tests on a 16-core machine indicate an elapsed time for a large training session to be reduced by a factor of 3 to 4.

Uses the std::minstd_rand random number generator instead of the usual rand() and srand(). This allows for repeatable random numbers to be generated which is importand when implementing unit tests.

cjlin1 commented 1 year ago

Thanks. I took the commit on the error message of svm_cross_validation and signed off on your behalf in our internal git. It will appear in the next release. Thanks

For the other two, we need some studies. We used rand() because we are mainly C based (see svm-train.c and svm-predict.c). But the thread safety of rand() is a concern we always hope to address. For kernel operations in parallel, we prefer not having it in the core code due to first portability, and second various possible tools (openmp etc.). We might include your tbb code in faq.

Thanks again.

On 2022-08-26 00:03, Rob Ward wrote:

This fork from the LibSVM 3.3 package incorporates the following enhancements:

Implements parallelism in the get_Q() and svc_predict_values() functions. This requires that the tbb (thread building blocks) package be installed ("apt-get install -y libtbb-dev"). Initial performance tests on a 16-core machine indicate an elapsed time for a large training session to be reduced by a factor of 3 to 4.

Uses the std::minstd_rand random number generator instead of the usual rand() and srand(). This allows for repeatable random numbers to be generated which is importand when implementing unit tests.

YOU CAN VIEW, COMMENT ON, OR MERGE THIS PULL REQUEST ONLINE AT:

https://github.com/cjlin1/libsvm/pull/186

COMMIT SUMMARY

  • f079760 [1] Add tbb-based parallelism to get_Q() and svm_predict_values()
  • 5e4850f [2] Minor fix to error message in svm_cross_validation()
  • 2699ea9 [3] Use std::minstd_rand to guarantee repeatable random number generation
  • ea8e8d9 [4] Update README

FILE CHANGES

(2 files [5])

  • M README [6] (13)
  • M svm.cpp [7] (52)

PATCH LINKS:

-- Reply to this email directly, view it on GitHub [8], or unsubscribe [9]. You are receiving this because you are subscribed to this thread.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/cjlin1/libsvm/pull/186", "url": "https://github.com/cjlin1/libsvm/pull/186", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/pull/186/commits/f079760b73061a823aa73d8185140d484a2bb975 [2] https://github.com/cjlin1/libsvm/pull/186/commits/5e4850f6cef6da668d30bf138c4392ff168c9e7c [3] https://github.com/cjlin1/libsvm/pull/186/commits/2699ea99fe418238a55a54bb415cc2effc2bf404 [4] https://github.com/cjlin1/libsvm/pull/186/commits/ea8e8d964d589676dfe200c7a926c3a3611fe9c3 [5] https://github.com/cjlin1/libsvm/pull/186/files [6] https://github.com/cjlin1/libsvm/pull/186/files#diff-2b7814d3fca2e99e56c51b6ff2aa313ea6e9da6424804240aa8ad891fdfe0900 [7] https://github.com/cjlin1/libsvm/pull/186/files#diff-f5a2142d812bf17daef614246a91d4bb2c520c0d0252bfae5f1e0c3a747a18b8 [8] https://github.com/cjlin1/libsvm/pull/186 [9] https://github.com/notifications/unsubscribe-auth/ABI3BHRUO7NCLNIKPJXGUEDV26KOFANCNFSM57TV25VA

Roebu commented 1 year ago

Thank you for the prompt reply. Regarding the tbb changes, I perfectly understand why you want to keep the code portable across platforms. Perhaps some kind of '#ifdef' based solution would be appropriate here (?).

cjlin1 commented 1 year ago

Many thanks. Yes, #ifdef is possible. In this regard, I may prefer openmp over tbb for the smallest change. All I need is

if defined(_OPENMP)

#pragma omp ...

endif

and in Makefile have

Uncomment the following line to turn on parallelization

CFLAGS += -fopenmp

Any comments? Thanks

On 2022-08-26 20:18, Rob Ward wrote:

Thank you for the prompt reply. Regarding the tbb changes, I perfectly understand why you want to keep the code portable across platforms. Perhaps some kind of '#ifdef' based solution would be appropriate here (?).

-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.> [ { @.": "http://schema.org", @.": "EmailMessage", "potentialAction": { @.": "ViewAction", "target": "https://github.com/cjlin1/libsvm/pull/186#issuecomment-1228418596", "url": "https://github.com/cjlin1/libsvm/pull/186#issuecomment-1228418596", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { @.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/pull/186#issuecomment-1228418596 [2] https://github.com/notifications/unsubscribe-auth/ABI3BHUNQHBLGJM7NXVOIM3V3CYZLANCNFSM57TV25VA

Roebu commented 1 year ago

No major comments from my side. I would only say that we have had very good experience using the tbb library for implementing parallelism in our code. The tbb library automatically makes use of all the cores available.