ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
249 stars 109 forks source link

Job will fail randomly #735

Open Liu-Delin opened 3 years ago

Liu-Delin commented 3 years ago

We have two arguments to set memory when run shifu stats.

  1. -Dmapreduce.map.memory.mb=4096
  2. -Dmapreduce.map.java.opts="-Xms3700m -Xmx3700m -server -XX:MaxPermSize=64m -XX:PermSize=64m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:ParallelGCThreads=8 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"

It means that the worker has 4096MB and JVM has 3700MB memory. But sometimes it will fail due to physical memory is not enough, and we can set less JVM memory to avoid this issue.