Closed tianxiaoliang closed 9 years ago
Could you please post the logs of Mesos master/slave and Marathon? Also please post the sandbox logs for your application. Which version of Mesos and Marathon are you running?
I tried a very simple spring mvc example ,it works fine with marathon
Maybe it caused by complicated reason
Where is the log file for marathon and mesos? I used apt-get to install whole mesosphere
The Marathon logs should be in /var/log/syslog. The mesos logs should either be there or have their own files in /var/log somewhere. The sandbox logs can be accessed through the Mesos web UI. Just click the sandbox link next to one of the failed tasks.
after I increase the memory,The app works fine ,Thanks,I guess the problem is caused by OutOfMemory
I didn't found error in my app log Here is the slave log: E0313 00:48:30.536258 1626 slave.cpp:2344] Failed to update resources for container e3f16cd9-8273-420b-9640-279017928d13 of executor ubuntu.51858eb3-c919-11e4-a75b-22000b78d613 running task ubuntu.51858eb3-c919-11e4-a75b-22000b78d613 on status update for terminal task, destroying container: Failed to determine cgroup for the 'cpu' subsystem: Failed to read /proc/24344/cgroup: Failed to open file '/proc/24344/cgroup': No such file or directory
Mar 13 00:48:30 ec2-54-89-249-100 mesos-slave[1613]: I0313 00:48:30.536439 1628 docker.cpp:1501] Destroying container 'e3f16cd9-8273-420b-9640-279017928d13' Mar 13 00:48:30 ec2-54-89-249-100 mesos-slave[1613]: I0313 00:48:30.536478 1628 docker.cpp:1593] Running docker stop on container 'e3f16cd9-8273-420b-9640-279017928d13'
No error in master log
Check dmesg
on slave node where your app was killed for OOM from linux kernel. It looks like this:
[9143134.805765] java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
[9143134.805769] java cpuset=5929cf05bc9e561b7c33ffbc217250af1b94c70017743658b2558a02a1ab1424 mems_allowed=0
[9143134.805772] CPU: 2 PID: 32498 Comm: java Not tainted 3.14.18-gentoo #2
[9143134.805781] Hardware name: Supermicro X9SCD/X9SCD, BIOS 2.0b 09/17/2012
[9143134.805782] 0000000000000000 ffff880119331000 ffffffff8174ffea ffff880133f1de80
[9143134.805784] ffffffff8174be8a ffff88022fcd1a40 0000000000000003 0000000000000003
[9143134.805786] ffffffff8107cf1c ffff88022fcd1a40 ffffffff8107cf41 0000000000011a40
[9143134.805789] Call Trace:
[9143134.805795] [<ffffffff8174ffea>] ? dump_stack+0x41/0x51
[9143134.805799] [<ffffffff8174be8a>] ? dump_header+0x70/0x1e5
[9143134.805803] [<ffffffff8107cf1c>] ? check_preempt_curr+0x7c/0x90
[9143134.805805] [<ffffffff8107cf41>] ? ttwu_do_wakeup+0x11/0x80
[9143134.805810] [<ffffffff810daca7>] ? find_lock_task_mm+0x47/0xa0
[9143134.805813] [<ffffffff810db186>] ? oom_kill_process+0x286/0x3e0
[9143134.805815] [<ffffffff810daca7>] ? find_lock_task_mm+0x47/0xa0
[9143134.805825] [<ffffffff8112cdd8>] ? mem_cgroup_oom_synchronize+0x4e8/0x550
[9143134.805835] [<ffffffff810fb8e8>] ? handle_mm_fault+0x308/0xdb0
[9143134.805840] [<ffffffff8112c2b0>] ? mem_cgroup_charge_common+0x90/0x90
[9143134.805842] [<ffffffff810dba0b>] ? pagefault_out_of_memory+0xb/0x80
[9143134.805846] [<ffffffff81034756>] ? __do_page_fault+0x496/0x4a0
[9143134.805849] [<ffffffff81085ad1>] ? update_curr+0x171/0x180
[9143134.805851] [<ffffffff81083777>] ? set_next_entity+0x37/0x80
[9143134.805853] [<ffffffff810841c3>] ? pick_next_task_fair+0x63/0x140
[9143134.805856] [<ffffffff81753316>] ? __schedule+0x266/0x660
[9143134.805860] [<ffffffff81756c42>] ? page_fault+0x22/0x30
[9143134.805862] Task in /docker/5929cf05bc9e561b7c33ffbc217250af1b94c70017743658b2558a02a1ab1424 killed as a result of limit of /docker/5929cf05bc9e561b7c33ffbc217250af1b94c70017743658b2558a02a1ab1424
[9143134.805864] memory: usage 1048528kB, limit 1048576kB, failcnt 114269
[9143134.805866] memory+swap: usage 1082404kB, limit 2097152kB, failcnt 0
[9143134.805867] kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
[9143134.805868] Memory cgroup stats for /docker/5929cf05bc9e561b7c33ffbc217250af1b94c70017743658b2558a02a1ab1424: cache:160KB rss:1048368KB rss_huge:20480KB mapped_file:52KB writeback:0KB swap:33876KB inactive_anon:525292KB active_anon:523076KB inactive_file:0KB active_file:32KB unevictable:0KB
[9143134.805878] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[9143134.805914] [32249] 0 32249 792663 261466 752 8482 0 java
[9143134.805966] Memory cgroup out of memory: Kill process 32249 (java) score 1032 or sacrifice child
Ok,I got it ,I found a log of this kind of log,d [665709.351067] Memory cgroup out of memory: Kill process 16469 (java) score 906 or sacrifice child [665922.005904] Memory cgroup out of memory: Kill process 17233 (java) score 899 or sacrifice child [666186.574552] Memory cgroup out of memory: Kill process 17995 (java) score 900 or sacrifice child [666405.049341] Memory cgroup out of memory: Kill process 18762 (java) score 901 or sacrifice child [666642.358986] Memory cgroup out of memory: Kill process 19528 (java) score 902 or sacrifice child [666827.063841] Memory cgroup out of memory: Kill process 20286 (java) score 900 or sacrifice child
Thanks very much.
Glad you could solve the issue!
I use mathathon restful API to launch a docker container which has a web server here is the json { "container": { "type": "DOCKER", "docker": { "network": "BRIDGE", "portMappings": [ { "containerPort": 8080, "hostPort": 0, "servicePort": 9000, "protocol": "tcp" }, { "containerPort": 161, "hostPort": 0, "protocol": "udp"} ], "image": "dhub.XXX.org/release_manager-ubuntu1404" } }, "id": "ubuntu", "instances": 1, "cpus": 0.5, "mem": 512, "uris": [], "cmd": "/start.sh" }
After launched
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0243c616186e dhub.XXX.org/release_manager-ubuntu1404:latest /bin/sh -c /start.sh About a minute ago Up About a minute 0.0.0.0:31000->8080/tcp, 0.0.0.0:31001->161/udp mesos-95797629-1669-4a59-a759-cd5fdfaa1f0f
I enter this container to check service
root@ec2-54-89-249-100:~# docker-enter 0243c616186e root@0243c616186e:/# netstat -anp |grep java tcp6 0 0 :::8080 :::* LISTEN 449/java
tcp6 0 0 127.0.0.1:8081 :::* LISTEN 449/java
root@0243c616186e:/# curl http://127.0.0.1:8080 curl: (52) Empty reply from server root@0243c616186e:/# Killed root@ec2-54-89-249-100:~#
M sure this web server is ok,because if I manually start it, I can curl it ,and container doesn't get killed
How should I fix this problem,I don't think it is docker's problem
Thanks