ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.2k stars 9.71k forks source link

Building apollo 3.5 or higher often causes the OS stuck #7719

Closed davidhopper2003 closed 5 years ago

davidhopper2003 commented 5 years ago

I downloaded the latest Apollo master branch and built it with the bash apollo.sh build command. I found that the OS is often stuck and it is useless to press the Ctrl + Alt + F1 key to open a text terminal. I had to power off the computer and restart it. After the restarting, there was still a stuck phenomenon. What's the matter with the apollo.sh file? Any suggestions will be appreciated.

The resource consumption built with 4 jobs (bash apollo.sh build -j 4) is shown as follows: apollo_build_stuck

System information

martins-mozeiko commented 5 years ago

It looks like you are running out of memory. Find --jobs=$(nproc) in apollo.sh file and replace it with --jobs=2. This will make build process to use only 2 cores. Building will be longer, but will use less memory.

davidhopper2003 commented 5 years ago

@martins-mozeiko Thank you. Your approach is absolutely correct. My question is as follows: It wouldn't run out of memory with the bash apollo.sh build command prior to apollo 3.0. Why does this problem occurred with apollo 3.5 or higher?

davidhopper2003 commented 5 years ago

@martins-mozeiko I figured it out. Bazel consumed too much memory. After I add an 8 GB memory card, the GUI will never freeze. Therefore, it's best to build Apollo 3.5 or higher with two tasks on a computer with less than 16 GB of memory:

bash apollo.sh build -j 2

build_apollo_with_16GB

gengqx commented 5 years ago

To build apollo,it would be better use at least 8GB virtual memory, only 977M virtual memery may also cause a problem. Especially when use our own docker image.

martins-mozeiko commented 5 years ago

@daviduhm exactly. I have 64-core machine. And building Apollo with default settings hangs for me every time because it uses too much memory. Using swap file is terrible option because it slows down everything to many hours of building. I need to reduce how many cores it uses to 20 or so to be able to build Apollo successfully in reasonable time.

davidhopper2003 commented 5 years ago

@gengqx Sure. Thanks.

davidhopper2003 commented 5 years ago

@martins-mozeiko Thank you for detailed information.

natashadsouza commented 5 years ago

Closing this issue as it appears to be resolved. @martins-mozeiko @gengqx thank you for the explanation @davidhopper2003 thank you for posting your fix to help others developers who may encounter this issue. Added your suggestions to #7785

Iqbalparvi commented 5 years ago

I have the same issue but my system hangs out and no progress at all when running bash apollo.sh build at compiling stage looks like as ........ @in_dev_docker:/apollo$ bash apollo.sh build System check passed. Build continue ... [WARNING] ESD CAN library supplied by ESD Electronics does not exist. If you need ESD CAN, please refer to third_party/can_card_library/esd_can/README.md. Running build under GPU mode. GPU is required to run the build. [INFO] Start building, please wait ... INFO: Reading 'startup' options from /apollo/tools/bazel.rc: --batch_cpu_scheduling --host_jvm_args=-XX:-UseParallelGC Loading package: modules/localization/msf/local_tool/local_visualization/online_visual Loading package: modules/planning/tasks/optimizers Loading package: modules/prediction/network/cruise_model Loading package: modules/perception/proto __Loading package: modules/tools/prediction/fake_prediction __Loading package: modules/planning/math Loading package: modules/third_party_perception/proto Loading package: modules/prediction/predictor Loading package: modules/monitor/hardware Loading package: modules/drivers/canbus Loading package: modules/perception/camera/tools/offline Loading package: modules/drivers/velodyne/parser Loading package: cyber Loading package: cyber/timer Loading package: cyber/transport Loading package: cyber/component Loading package: cyber/data Loading package: cyber/examples/common_component_example [INFO] Building on x86_64... [INFO] Building with --jobs=4 --ram_utilization_factor 80 for x86_64 INFO: Reading 'startup' options from /apollo/tools/bazel.rc: --batch_cpu_scheduling --host_jvm_args=-XX:-UseParallelGC INFO: (04-15 05:26:16.751) Found 3240 targets... Slow read: a 1729-byte read from /apollo/modules/control/common/pid_BC_controller.h took 14512ms. [4,604 / 4,613] (04-15 05:51:27.122) Compiling modules/drivers/gnss/proto/gnss\ _best_pose.pb.cc

kindly any sugestions.......i cant make progress from here ...

ktnrs55 commented 5 years ago

@Iqbalparvi --jobs=4 depends on your core(s).
If

$nproc

gives you 4, then --jobs=2 might work ( my guess).

lemketron commented 4 years ago

@martins-mozeiko wrote:

It looks like you are running out of memory. Find --jobs=$(nproc) in apollo.sh file and replace it with --jobs=2. This will make build process to use only 2 cores. Building will be longer, but will use less memory.

I have been experiencing this as well on a 12-core Razer Blade with 16GB of RAM, and have been experimenting with the build settings. I tried building with nproc/2 (6 cores) and also with 9 and with 10 and so far it seems to be doing ok. I'm inclined to think that (at least with 16GB of RAM) a better value is nproc-2. On my system this seems to avoid running out of memory, while taking advantage of 10 of my 12 cores.

Another issue that might have helped is that I decreased ram_utilitization_factor from 80 to 70. I'm not entirely sure whether dropping the cores or ram utilization helped more but together this lets my build succeed without failing.

I'm curious if @davidhopper2003 or anyone else with an 8GB system (or one that fails to build apollo successfully) could try nproc-2 (and ram_utilization_factor=70) and see how that goes?

My JOB_ARG line in apollo.sh now looks like this: JOB_ARG="--jobs=$(expr $(nproc) - 2 ) --ram_utilization_factor 70"

It would be nice to change the apollo.sh script to be more generally successful but I wouldn't want to slow down builds that aren't crashing.

Srikar-Chaganti commented 1 year ago

@lemketron Would you mind sharing your apollo.sh file? I have the same issue but couldn't find any JOB_ARG or it's usage anywhere in my current apollo.sh file.

hkyee commented 12 months ago

Hi @Srikar-Chaganti , I can’t find it either, may I know how did you resolve this?

Srikar-Chaganti commented 11 months ago

Hi @hkyee, you can change the cores on apollo_base.sh