Closed naveenmiriyalu closed 5 years ago
Hi. The first half and last half of working set are used for kernel value reusing, and also make convergence faster. You may be interested in this paper https://www.comp.nus.edu.sg/~wenzy/papers/tkde18-pgpusvm.pdf
Naveen notifications@github.com于2019年2月6日 周三下午8:15写道:
Hi , I am trying to understand the way the working_set_size works and its impact on the data transferred to and fro from the GPU ? And I observe that there are two working sets called first_half and last_half ,each of 512 . I would like to understand whats happening as part of the algorithm and whats its effect on the GPU data transfer ? It is equivalent to batch size being operated on ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Xtra-Computing/thundersvm/issues/122, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxVswAGEOYnSplST7NjA8sqW8yjcbxBks5vKsdfgaJpZM4alHOc .
Hi , I changed the working set size to a number like 2048 . I get the following error pasted below . what is the effect of working set size on the GPU resources ??
2019-02-06 06:46:57,732 FATAL [default] Check failed: [error == cudaSuccess] invalid configuration argument 2019-02-06 06:46:57,732 WARNING [default] Aborting application. Reason: Fatal log at [/home/tf/ThunderSVM/thundersvm/src/thundersvm/kernel/smo_kernel.cu:309] Aborted (core dumped)
And what do the global iter and local iter in the output convey ?
2019-02-06 01:34:43,376 INFO [default] global iter = 400, total local iter = 102912, diff = 2.13
Does global iter mean number of epochs ??
Hi , Pardon me for a flurry of questions . I am reading the paper which you had referred . In the meanwhile ,I am trying to do a performance comparison of ThunderSVM and other GPU based SVM solvers for my study , and I have a few questions on the ThunderSVM implementation .
Hi , I changed the working set size to a number like 2048 . I get the following error pasted below . what is the effect of working set size on the GPU resources ??
2019-02-06 06:46:57,732 FATAL [default] Check failed: [error == cudaSuccess] invalid configuration argument 2019-02-06 06:46:57,732 WARNING [default] Aborting application. Reason: Fatal log at [/home/tf/ThunderSVM/thundersvm/src/thundersvm/kernel/smo_kernel.cu:309] Aborted (core dumped)
The working set size is equal to the number of threads in one block (one thread per instance). The maximum threads in one block is 1024 on recent NVIDIA GPUs. So set the working set size to 2048 (lager than 1024) will raise an error.
Hi , Pardon me for a flurry of questions . I am reading the paper which you had referred . In the meanwhile ,I am trying to do a performance comparison of ThunderSVM and other GPU based SVM solvers for my study , and I have a few questions on the ThunderSVM implementation .
- Is there a way by which I can increase the amount of computation done during every batch , so that I can converge faster ?
- What do the global iter and the local iter mean?
- Can I assume that one global iteration is equivalent to one epoch ?
- From the paper I see that , kernel computation takes the maximum time . Is it due to the following cuSparse call ? cusparseScsrmm2(handle, CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE, m, n, k, nnz, &one, descr, csr_val.device_data(), csr_row_ptr.device_data(), csr_col_ind.device_data(), dense_mat.device_data(), n, &zero, result.device_data(), m);
hope this helps
Hi shijiashuai,
Thanks a lot for the answers . It really helped a lot . How do we get the hinge loss in thundersvm?
Not sure what you mean by "get the hinge loss"... The loss is not directly measured in ThunderSVM.
If you want to get some ideas about the "loss reduction" or the "improvement of the objective value" during the training, you can check the output of ThunderSVM by searching for the substring obj =
.
Seems no more questions. So I like to close it.
Hi , I am trying to understand the way the working_set_size works and its impact on the data transferred to and fro from the GPU ? And I observe that there are two working sets called first_half and last_half ,each of 512 . I would like to understand whats happening as part of the algorithm and whats its effect on the GPU data transfer ? It is equivalent to batch size being operated on ?