Xtra-Computing / thundersvm

ThunderSVM: A Fast SVM Library on GPUs and CPUs
Apache License 2.0
1.56k stars 217 forks source link

working set size #122

Closed naveenmiriyalu closed 5 years ago

naveenmiriyalu commented 5 years ago

Hi , I am trying to understand the way the working_set_size works and its impact on the data transferred to and fro from the GPU ? And I observe that there are two working sets called first_half and last_half ,each of 512 . I would like to understand whats happening as part of the algorithm and whats its effect on the GPU data transfer ? It is equivalent to batch size being operated on ?

shijiashuai commented 5 years ago

Hi. The first half and last half of working set are used for kernel value reusing, and also make convergence faster. You may be interested in this paper https://www.comp.nus.edu.sg/~wenzy/papers/tkde18-pgpusvm.pdf

Naveen notifications@github.com于2019年2月6日 周三下午8:15写道:

Hi , I am trying to understand the way the working_set_size works and its impact on the data transferred to and fro from the GPU ? And I observe that there are two working sets called first_half and last_half ,each of 512 . I would like to understand whats happening as part of the algorithm and whats its effect on the GPU data transfer ? It is equivalent to batch size being operated on ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Xtra-Computing/thundersvm/issues/122, or mute the thread https://github.com/notifications/unsubscribe-auth/AFxVswAGEOYnSplST7NjA8sqW8yjcbxBks5vKsdfgaJpZM4alHOc .

naveenmiriyalu commented 5 years ago

Hi , I changed the working set size to a number like 2048 . I get the following error pasted below . what is the effect of working set size on the GPU resources ??

2019-02-06 06:46:57,732 FATAL [default] Check failed: [error == cudaSuccess] invalid configuration argument 2019-02-06 06:46:57,732 WARNING [default] Aborting application. Reason: Fatal log at [/home/tf/ThunderSVM/thundersvm/src/thundersvm/kernel/smo_kernel.cu:309] Aborted (core dumped)

naveenmiriyalu commented 5 years ago

And what do the global iter and local iter in the output convey ?

2019-02-06 01:34:43,376 INFO [default] global iter = 400, total local iter = 102912, diff = 2.13

Does global iter mean number of epochs ??

naveenmiriyalu commented 5 years ago

Hi , Pardon me for a flurry of questions . I am reading the paper which you had referred . In the meanwhile ,I am trying to do a performance comparison of ThunderSVM and other GPU based SVM solvers for my study , and I have a few questions on the ThunderSVM implementation .

  1. Is there a way by which I can increase the amount of computation done during every batch , so that I can converge faster ?
  2. What do the global iter and the local iter mean?
  3. Can I assume that one global iteration is equivalent to one epoch ?
  4. From the paper I see that , kernel computation takes the maximum time . Is it due to the following cuSparse call ? cusparseScsrmm2(handle, CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE, m, n, k, nnz, &one, descr, csr_val.device_data(), csr_row_ptr.device_data(), csr_col_ind.device_data(), dense_mat.device_data(), n, &zero, result.device_data(), m);
shijiashuai commented 5 years ago

Hi , I changed the working set size to a number like 2048 . I get the following error pasted below . what is the effect of working set size on the GPU resources ??

2019-02-06 06:46:57,732 FATAL [default] Check failed: [error == cudaSuccess] invalid configuration argument 2019-02-06 06:46:57,732 WARNING [default] Aborting application. Reason: Fatal log at [/home/tf/ThunderSVM/thundersvm/src/thundersvm/kernel/smo_kernel.cu:309] Aborted (core dumped)

The working set size is equal to the number of threads in one block (one thread per instance). The maximum threads in one block is 1024 on recent NVIDIA GPUs. So set the working set size to 2048 (lager than 1024) will raise an error.

shijiashuai commented 5 years ago

Hi , Pardon me for a flurry of questions . I am reading the paper which you had referred . In the meanwhile ,I am trying to do a performance comparison of ThunderSVM and other GPU based SVM solvers for my study , and I have a few questions on the ThunderSVM implementation .

  1. Is there a way by which I can increase the amount of computation done during every batch , so that I can converge faster ?
  2. What do the global iter and the local iter mean?
  3. Can I assume that one global iteration is equivalent to one epoch ?
  4. From the paper I see that , kernel computation takes the maximum time . Is it due to the following cuSparse call ? cusparseScsrmm2(handle, CUSPARSE_OPERATION_NON_TRANSPOSE, CUSPARSE_OPERATION_TRANSPOSE, m, n, k, nnz, &one, descr, csr_val.device_data(), csr_row_ptr.device_data(), csr_col_ind.device_data(), dense_mat.device_data(), n, &zero, result.device_data(), m);
  1. increase the working set size can, but the memory consumption will become a problem.
  2. "global iter" means the number of subproblems of the original QP problem. "local iter" means the summation of the number of iterations to solve each subproblems.
  3. There is no "epoch" in SVM training. The solver iteratively select a working set (batch) to optimise if not optimal. We don't care about how many epochs has been done.
  4. It depends on the datasets, and is true for many cases.

hope this helps

naveenmiriyalu commented 5 years ago

Hi shijiashuai,

Thanks a lot for the answers . It really helped a lot . How do we get the hinge loss in thundersvm?

zeyiwen commented 5 years ago

Not sure what you mean by "get the hinge loss"... The loss is not directly measured in ThunderSVM.

If you want to get some ideas about the "loss reduction" or the "improvement of the objective value" during the training, you can check the output of ThunderSVM by searching for the substring obj =.

zeyiwen commented 5 years ago

Seems no more questions. So I like to close it.