feedzai / feedzai-openml-java

Implementations for Feedzai's OpenML APIs to allow for usage of machine learning models in the Java programming language.
https://www.feedzai.com
Apache License 2.0
2 stars 11 forks source link

PULSEDEV-32102 Add support for large LightGBM datasets #37

Closed AlbertoEAF closed 4 years ago

AlbertoEAF commented 4 years ago
codecov[bot] commented 4 years ago

Codecov Report

Merging #37 into master will decrease coverage by 0.05%. The diff coverage is 75.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master      #37      +/-   ##
============================================
- Coverage     77.69%   77.63%   -0.06%     
  Complexity      357      357              
============================================
  Files            39       39              
  Lines          1318     1319       +1     
  Branches        120      120              
============================================
  Hits           1024     1024              
- Misses          228      229       +1     
  Partials         66       66              
Impacted Files Coverage Δ Complexity Δ
...i/openml/provider/lightgbm/LightGBMAlgorithms.java 100.00% <ø> (ø) 3.00 <0.00> (?)
...er/lightgbm/LightGBMBinaryClassificationModel.java 57.14% <ø> (ø) 8.00 <0.00> (?)
...enml/provider/lightgbm/LightGBMDescriptorUtil.java 96.96% <ø> (ø) 3.00 <0.00> (?)
...ai/openml/provider/lightgbm/LightGBMException.java 100.00% <ø> (ø) 1.00 <0.00> (?)
...i/openml/provider/lightgbm/LightGBMMLProvider.java 0.00% <ø> (ø) 0.00 <0.00> (?)
...openml/provider/lightgbm/LightGBMModelCreator.java 82.35% <ø> (ø) 20.00 <0.00> (?)
...feedzai/openml/provider/lightgbm/LightGBMSWIG.java 70.96% <0.00%> (ø) 11.00 <0.00> (?)
...eedzai/openml/provider/lightgbm/LightGBMUtils.java 86.95% <ø> (ø) 4.00 <0.00> (?)
...eedzai/openml/provider/lightgbm/SWIGResources.java 73.33% <ø> (ø) 7.00 <0.00> (?)
...tgbm/LightGBMBinaryClassificationModelTrainer.java 87.28% <100.00%> (ø) 23.00 <0.00> (?)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e54aa03...b56b1bc. Read the comment docs.

AlbertoEAF commented 4 years ago

Ready for review even though it's not solving yet the large dataset issue (only on my computer it seems).

@ptraca @JPDSousa regarding producing a fat jar from the parent lightgbm folder, I didn't do it to avoid further complexity. The "provider" openml-lightgbm already does it at the end, packaging lightgbmlib and the .so's with it.

To maintain the conventions of the repo, the provider artifact is openml-lightgbm @ groupId=com.feedzai. The full lightgbm implementation is now a meta-module containing lightgbm-builder and openml-lightgbm. lightgbm-builder and the meta-module are both "hidden" at group com.feedzai.openml.lightgbm.meta. I also moved make-lightgbm output artifact to the group above it to be together: com.feedzai.openml.lightgbm.lightgbmlib.

AlbertoEAF commented 4 years ago

Fixed :) Ping @shenggwang @JPDSousa for final review.

shengwangsw commented 4 years ago

The build didn't pass. It failed on:

[INFO] lightgbm-builder ................................... FAILURE [01:11 min]

It seemed to me be related with the script:

[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:exec (generate-lightgbm-build) on project lightgbm-builder: Command execution failed. Process exited with an error: 100 (Exit value: 100) -> [Help 1]
AlbertoEAF commented 4 years ago

The build didn't pass. It failed on:

[INFO] lightgbm-builder ................................... FAILURE [01:11 min]

It seemed to me be related with the script:

[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:exec (generate-lightgbm-build) on project lightgbm-builder: Command execution failed. Process exited with an error: 100 (Exit value: 100) -> [Help 1]

It failed to fetch things in the network in Ubuntu :sadface:

E: Failed to fetch https://esm.ubuntu.com/ubuntu/pool/main/g/glib2.0/libglib2.0-0_2.40.2-0ubuntu1.1+esm3_amd64.deb  HttpError401
E: Failed to fetch https://esm.ubuntu.com/ubuntu/pool/main/p/python-apt/python-apt-common_0.9.3.5ubuntu3+esm2_all.deb  HttpError401
E: Failed to fetch https://esm.ubuntu.com/ubuntu/pool/main/p/python-apt/python3-apt_0.9.3.5ubuntu3+esm2_amd64.deb  HttpError401
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
The command '/bin/sh -c apt-get update  && apt-get install -y --no-install-recommends         software-properties-common  && apt-add-repository ppa:git-core/ppa  && apt-get update  && apt-get install -y --no-install-recommends         apt-utils         curl         git         jq         libcurl3         libicu52         libunwind8         netcat  && curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash  && apt-get install -y --no-install-recommends git-lfs  && rm -rf /var/lib/apt/lists/*  && rm -rf /etc/apt/sources.list.d/*  echo "UPDATED."' returned a non-zero code: 100
[ERROR] Command execution failed.

Maybe I should add this as an image to docker.hub? As in the other day it was a sourceforge downtime which made downloading SWIG fail and the build fail too. Besides, building the image takes 10minutes each time.

JPDSousa commented 4 years ago

The build didn't pass. It failed on:

[INFO] lightgbm-builder ................................... FAILURE [01:11 min]

It seemed to me be related with the script:

[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:3.0.0:exec (generate-lightgbm-build) on project lightgbm-builder: Command execution failed. Process exited with an error: 100 (Exit value: 100) -> [Help 1]

It failed to fetch things in the network in Ubuntu :sadface:

E: Failed to fetch https://esm.ubuntu.com/ubuntu/pool/main/g/glib2.0/libglib2.0-0_2.40.2-0ubuntu1.1+esm3_amd64.deb  HttpError401
E: Failed to fetch https://esm.ubuntu.com/ubuntu/pool/main/p/python-apt/python-apt-common_0.9.3.5ubuntu3+esm2_all.deb  HttpError401
E: Failed to fetch https://esm.ubuntu.com/ubuntu/pool/main/p/python-apt/python3-apt_0.9.3.5ubuntu3+esm2_amd64.deb  HttpError401
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
The command '/bin/sh -c apt-get update  && apt-get install -y --no-install-recommends         software-properties-common  && apt-add-repository ppa:git-core/ppa  && apt-get update  && apt-get install -y --no-install-recommends         apt-utils         curl         git         jq         libcurl3         libicu52         libunwind8         netcat  && curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash  && apt-get install -y --no-install-recommends git-lfs  && rm -rf /var/lib/apt/lists/*  && rm -rf /etc/apt/sources.list.d/*  echo "UPDATED."' returned a non-zero code: 100
[ERROR] Command execution failed.

Maybe I should add this as an image to docker.hub? As in the other day it was a sourceforge downtime which made downloading SWIG fail and the build fail too. Besides, building the image takes 10minutes each time.

Is that a problem in terms of licensing? If not that might be a good improvement. However, let's do it in a separate PR