aksnzhy / xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
https://xlearn-doc.readthedocs.io/en/latest/index.html
Apache License 2.0
3.09k stars 519 forks source link

DMatrix transition for NumPy\Pandas #240

Closed etveritas closed 5 years ago

etveritas commented 5 years ago

DMatrix transition from python numpy or pandas.

etveritas commented 5 years ago

@aksnzhy Do you think this design is feasible?

aksnzhy commented 5 years ago

This PR looks nice. I will review this code as soon as possible.

aksnzhy commented 5 years ago

@etveritas Hi, could you please add some demo by using this API? For example, we can add each demo (demo path) a python example by using this interface (It is also a good test). And also, we need to update the document python_package.rst to introduce this interface. Thanks for the effort!

etveritas commented 5 years ago

@aksnzhy No problem. Before it, I'll enchance this PR, like more expectional handles.

etveritas commented 5 years ago

@aksnzhy All finished for this PR. My thought of this design is add two hyper parameters from_file, res_out to indicate DMatrix trainsition and result for numpy respectively, and this two parameters is invisible for CLI and indirectly use for users who use forepart language. If users tranform numpy/pandas to DMatrix, we set DMatrix dependent instead hash value(not use ). There are some features simple but not implement, just want to observe users demand, if need will add for other PR or version.

aksnzhy commented 5 years ago

@etveritas I'm sorry, actually I review this code 15 days ago and I wrote the comments. But I didn't click the submit bottom to submit it...

etveritas commented 5 years ago

This is the task list about this PR

@aksnzhy And I have a question about this code in file fm_score.cc

std::vector<real_t> sv(aligned_k, 0);
real_t* s = sv.data();
...
__m128 XMMs = _mm_load_ps(s+d);

Is s returned from std::vector::data the aligned address automatic?

aksnzhy commented 5 years ago

This is the task list about this PR

  • [x] Not skip zero when read CSV(#170).
  • [x] Add DMatrix transition from numpy\pandas.
  • [x] Add some demos for DMatrix transition(both regression and classification).
  • [x] Correct the xLearn Error handle for c_api.

@aksnzhy And I have a question about this code in file fm_score.cc

std::vector<real_t> sv(aligned_k, 0);
real_t* s = sv.data();
...
__m128 XMMs = _mm_load_ps(s+d);

Is s returned from std::vector::data the aligned address automatic?

I don't think this memory is aligned.

aksnzhy commented 5 years ago

@etveritas Thanks for this effort! I will merge this PR. Also, could you please update the document in xlearn_doc and xlearn_doc_cn, and the website will automatically updated when this two Repo changed.

aksnzhy commented 5 years ago

I find the result of run_titanic_no_cv_pandas.py and run_titanic_no_cv.py is quite different. Could you please check this problem?

etveritas commented 5 years ago

@etveritas Thanks for this effort! I will merge this PR. Also, could you please update the document in xlearn_doc and xlearn_doc_cn, and the website will automatically updated when this two Repo changed.

@aksnzhy Yeah, I have already made PRs for this two repos.

etveritas commented 5 years ago

@aksnzhy I test the correctness of DMatrix trainsition without lock-free.

aksnzhy commented 5 years ago

@aksnzhy I test the correctness of DMatrix trainsition without lock-free.

Oh I see. You are right.

etveritas commented 5 years ago

@aksnzhy I test the correctness of DMatrix trainsition without lock-free.

Oh I see. You are right.

Haha, I added explanation for users in these files.

etveritas commented 5 years ago

This is the task list about this PR

  • [x] Not skip zero when read CSV(#170).
  • [x] Add DMatrix transition from numpy\pandas.
  • [x] Add some demos for DMatrix transition(both regression and classification).
  • [x] Correct the xLearn Error handle for c_api.

@aksnzhy And I have a question about this code in file fm_score.cc

std::vector<real_t> sv(aligned_k, 0);
real_t* s = sv.data();
...
__m128 XMMs = _mm_load_ps(s+d);

Is s returned from std::vector::data the aligned address automatic?

I don't think this memory is aligned.

@aksnzhy That's my point, I find _mm_load_ps must receive aligned address in Intel Intrinsics Guide, but it works when compile...