Closed Horsmann closed 6 years ago
@reckart XGBoost has no maven release. There is a third-party fork, which doesn't seem to work. I don't think there will be a working release in the future.
Building and using the binary directly does work. Furthermore, the binary has something like a version by the release on GitHub (0.7). So, I am thinking of integrating this tool as self-build binary.
This introduces of course the problem of linking to third party libraries. I am not entirely sure but I think gcc
dependencies were acceptable?
I have these dependencies in the binary atm:
/usr/local/opt/gcc/lib/gcc/7/libstdc++.6.dylib (compatibility version 7.0.0, current version 7.24.0)
/usr/local/opt/gcc/lib/gcc/7/libgomp.1.dylib (compatibility version 2.0.0, current version 2.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
/usr/local/lib/gcc/7/libgcc_s.1.dylib (compatibility version 1.0.0, current version 1.0.0)
@reckart
I have on Linux the following dependencies when building no statically
linux-vdso.so.1 => (0x00007ffdbff3c000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa1cbdb2000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa1cbaa9000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fa1cb887000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa1cb671000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa1cb454000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa1cb08a000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa1cc134000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa1cae86000)
What I have read so far is that you probably do not want to statically link all these dependencies. As a matter of fact when compiling with the -static
linking flag the compiled binary does crash.
I copied the compiled binary over to other Linux machines we have running and it is working. It is of course dangerous to some extend but in this case I would probably not link statically.
@reckart Any thoughts on this? This module would be easy to integrate from the Java/TC side its just getting the binary prepared that troubles me a bit.
Some basic POSIX libraries like the ones you mention above should not be linked statically - but then they seldom change their APIs, so that should be fine. I have a peek at the XGBoost site and saw that there is some script to create a JAR file with binaries for multiple OSes which could be uploaded to a Maven repo (e.g. JCenter) - didn't check if you made any use of this in your integration. What solution did you end up going with?
I compiled the binaries for the respective OS platforms manually. I didn't see this JAR version you mentioned. Unless this jar (do you have a link?) works as-is. I would continue with the binary-compiling.
At the moment I have as open issues here:
-static
flag does compile but the resulting binary is broken and terminates with segmentation fault if called with dummy data. Only the dynamically compiled binary works. Using -static-libstdc++ -static-libgcc
reduces the dynamic libs from 9 to 7 but does not change the situation that many dynamic libs deps. remain.Thx. This script downloads the dynamic libaries for the 3 platforms into the jar but the binaries are not included.
I think I will continue with the binary compiling. @reckart Do you want to have a look regarding the static compilation on Linux? I had no luck with creating a working version.
My understanding is that XGBoost has Java bindings (XGBoost4J) which make use of these shared libraries directly without having to go through other CLI binaries.
There are some packages on maven central but from other developers https://search.maven.org/#search%7Cga%7C1%7Cxgboost4j
. This is one of these won't release to maven
tools.
I think its more work to get the java version to maven. It would be unfortunate if I don't get the windows 32 bit version working but I don't think that there are this many 32 bit windows systems left anyway. I still favor the binary-building way for the lower workload. With respect to the dynamic dependencies this does not really seem to offer much advantages anyway?
I assume that you currently build binaries that you then call from Java like command line tools. This creates a new process every time and you have to parse the output of the tool from stdout or from a file. Calling the code directly via its native interface avoids the overhead of creating a new process and also the need to parse data as you can work directly with native objects.
Yes, I start a new process and wait for its termination. This is some overhead, true. I think I am not still in favor of the binary way since I have done the work to the largest extent now. Furthermore, this 3rd party wrapper which probably did something similar as this jar with libs (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/xgboost.html) says at the bottom of the pages that it is not working for Windows. I haven't looked into it in detail but I don't think that this small merit is worth the additional effort of re-coding the interfacing to fit to this jar file.
@reckart The UKP linux Jenkins is missing GLIBC
apparently
Is this something you could install on the UKP Jenkins ?
/lib/x86_64-linux-gnu/libm.so.6: version
GLIBC_2.23' not found (required by /tmp/dkpro2774502877448388853runtime/xgboost?
The build server is still running Debian 8 (Jessie) which only has an older version of libc6 (2.19).
Would you update this lib? On our Jenkins the linux test case is passing.
It would require updating the entire VM. Cannot promise when that will be done.
I see. What is the best way to deal with the library issue? As-is there won't be any stable builds in the near time. Removing the test cases has a taste to it, too.
Windows 32 bit
doesn't seem to be supported at least I cannot compile a working version. What I have seen so far, all tutorials use 64bit mode. The inception of the project is from 2016, so I am not surprised that this is not supported. The Linux 32 bit binary is in the package.
Well, I see three options:
The windows binary might have the same problem. I think I installed the 2015 c++ redistributional
to have a certain .dll
available. The Jenkins windows has the 2010
package installed according to the documentation - I assume you can't update this either?
I think on Window it is less of an issue as it should just be sufficient to install the package (no OS upgrade necessary). Do you have a link handy?
@Horsmann ok, I have installed the VC2015 redistributable package on the Windows build server.
Build passed :)
@reckart I uploaded a wrong artifact to the UKP repository. There is one binary org.dkpro.tc.ml.xgboost-bin-20171230.2
that has as group id org.dkpro.tc.ml
. Sorry - could you delete this please?
Deleted.
thx
http://xgboost.readthedocs.io/en/latest/