cjlin1 / libsvm

LIBSVM -- A Library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
BSD 3-Clause "New" or "Revised" License
4.52k stars 1.64k forks source link

svm_train() in svmutil.py is not thread-safe #167

Open wyykak opened 4 years ago

wyykak commented 4 years ago

To be more exact, _svm_set_print_stringfunction() in svm.cpp is not thread-safe, as it modifies a global function pointer _svm_printstring. If you call _svmtrain() in multiple Python threads with '-q' option, it will try to convert _printnull() to a C-style function pointer and assign it to _svm_printstring in order to suppress the output.

However, the function pointer generated each time seems to point to different addresses, which makes it possible to corrupt _svm_printstring and crash the program.

I suggest to make the printing function pointer independent in different function calls. A global print option is neither thread-safe nor practical.

cjlin1 commented 4 years ago

There are a few other places where libsvm isn't thread safe.. Some are because of historical reasons. We may handle some important ones in the near future, though the plan is uncertain yet..

On 2020-04-30 19:03, wyykak wrote:

To be more exact, _svm_set_print_stringfunction() in svm.cpp is not thread-safe, as it modifies a global function pointer _svm_printstring. If you call _svmtrain() in multiple Python threads with '-q' option, it will try to convert _printnull() to a C-style function pointer and assign it to _svm_printstring in order to suppress the output.

However, the function pointer generated each time seems to point to different addresses, which makes it possible to corrupt _svm_printstring and crash the program.

I suggest to make the printing function pointer independent in different function calls. A global print option is neither thread-safe nor practical.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/cjlin1/libsvm/issues/167", "url": "https://github.com/cjlin1/libsvm/issues/167", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Links:

[1] https://github.com/cjlin1/libsvm/issues/167 [2] https://github.com/notifications/unsubscribe-auth/ABI3BHR7IMVMNRPIPF4IARLRPFLH7ANCNFSM4MVNGHBA

wyykak commented 4 years ago

Thanks for your reply! In fact I have tried to use libsvm together with multiprocessing in Python, but SVM models, as C structs, cannot be serialized and tranferred between processes. Currently libsvm only supports saving models as ascii files. Maybe you could add an interface to save models as strings? Therefore multiprocessing will be a possible way to parallelize libsvm.