Closed tqchen closed 6 years ago
Awesome. Seems that mxnet can learn from the design and implementation here.
@javelinjs Currently we solve the problem of embedding rabit allreduce jobs. We need to think a bit about how to handle server jobs, which involves start container that do not take data. Maybe @CodingCat will also have some thoughts on this
About the maven package thing, you could refer to https://github.com/dmlc/mxnet/tree/master/scala-package. I suggest to make native lib an independent module under jvm project. And here's a great reference for how to setup the jni compiling procedure in maven: http://www.tricoder.net/blog/?p=197
I put native lib into assembly jar and load it using https://github.com/dmlc/mxnet/blob/master/scala-package/core/src/main/scala/ml/dmlc/mxnet/util/NativeLibraryLoader.scala It was mainly inspired by https://github.com/mikiobraun/jblas/blob/master/src/main/java/org/jblas/util/LibraryLoader.java , though I'm not sure whether there may exist any license problem.
I can work on the Integration with spark dataframe and pipeline api
@rotationsymmetry welcome
OK, with the post here http://dmlc.ml/2016/03/14/xgboost4j-portable-distributed-xgboost-in-spark-flink-and-dataflow.html we have finished the first stage of the roadmap,
I will start working on dataframe and pipeline integration tmr
hmmm....I glanced it this afternoon...I'm sending the email about our xgboost4j to spark user list tmr morning
"Apache Beam is a unified model for defining both batch and streaming data-parallel processing pipelines, as well as a set of language-specific SDKs for constructing pipelines and Runners for executing them on distributed processing backends like Apache Spark, Apache Flink, and Google Cloud Dataflow."
@futurely thanks for information, yeah Beam is of courses on our radar when we considering the future version of XGBoost4J
Any plan to upload xgboost4j
into the maven repository? We are planning to integrate xgboost
in our product, hivemall, though, this becomes a barrier for the integration. Thanks in advance!
@maropu uploading to maven brings convenience to the user, but it will not be on a recent plan considering the other stuffs on the table and the limited number of people contributing here...
@CodingCat Ah, I see. Any opportunity to collaborate on the task? We do need this to support xgboost
in our product.
sure, you are welcomed to contribute, you can just create PR in this repo
okay, I'll look over related codes. many thanks!
I think the major issue is how to include the native so files targeting to different platforms in the same jar and make Java code pinpoint the right one exactly.....one of the good references is https://github.com/facebook/rocksdb, which I once wanted to look into but quickly distracted by other stuffs
One of awesome examples is snappy-java
which is widely used in many distributed system products like spark
and hadoop
. It checks a platform type when loading, and then loads the corresponding shared binary included in the package. See here.
@CodingCat Any progress in task 'dataframe and pipeline integration'?
@nicornk , sorry, no, I am busy in other stuffs, and others are welcome to contribute on this
@CodingCat @tqchen @rotationsymmetry Is work on integration with spark dataframes and pipeline API in progress? If not, I would like to start working on this. If it has already started, can you please share the initial design draft?
No, feel free to start
Hi @tanwanirahul, are you doing the integration with SparkPipeline? What kind of integration with Spark Pipeline is planned to be done?
I will post the initial version within the week
@dirceusemighini @CodingCat I had started working on this before.. But couldn't push to the finish line due to other priorities. Give me like Tuesday's time and I should be able to push an initial working version.
@maropu Did you make any progress concerning the Maven repository? I'm having exactly the same issue: Not being able to access XGBoost from some public repository blocks integration into of my projects.
@qqilihq No progress tough, we need to do something until a next release of our product. If you interested in this issue, plz check https://github.com/myui/hivemall/issues/370 and leave some comments there
@futurely regarding to google's dataflow, I also vote for Apache beam.
Is there an issue for publishing xgboost to maven central? A quick search didn't find anything. I have some ideas and would like to share them - and eventually maybe help with publishing it.
the only issue is to include native libs for various platforms in the jar and make the program locate them accurately....I haven't get a chance to look at the solutions
@CodingCat yes I understand it. I have some ideas how to do it - e.g. it should be possible to follow the same approach as MTJ (https://github.com/fommil/matrix-toolkits-java). Should I create an issue where we can discuss it?
sure, an issue or PR is welcome
also, we don't have cross validation support(neither in java nor in python) for ranking related tasks("objective:rank:pairwise") . can we add that to the road-map
.
Now that https://github.com/dmlc/xgboost/issues/884 is finished, we will be proposing a new roadmap, where more contributors can get involved. Please reply to this issue to add things if you have more thoughts.