discoproject / disco

a Map/Reduce framework for distributed computing
http://discoproject.org
BSD 3-Clause "New" or "Revised" License
1.63k stars 241 forks source link

Disco fails to start on CentOS #602

Open erikdubbelboer opened 9 years ago

erikdubbelboer commented 9 years ago

After doing the normal install process disco fails to start on CentOS.

The problem is fixed by manually filling /usr/var/disco/disco_8989.config which seems to be empty after the first start.

Versions:

$ git branch
* develop
$ erl
Erlang/OTP 17 [erts-6.2] [source-5c974be] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]
$ cat /etc/issue
CentOS release 6.6 (Final)
Kernel \r on an \m

$ cat /etc/redhat-release
CentOS release 6.6 (Final)

Disco is installed using sudo make install. The python libs are also installed system wide.

/usr/var/disco/ and /usr/lib/disco are both chowned to the user which starts disco.

What happened:

$ disco start
zero length field name in format
$ disco status
Master singapore-dev-1:8989 stopped
$ disco start
Master singapore-dev-1:8989 started
$ disco status
Master singapore-dev-1:8989 running
$ ddfs ls
...hangs

/etc/disco/settings.py:

# --
# -- Disco settings
# --
# The defaults should be pretty sane, so be careful changing them.

# Home of the Disco libraries
DISCO_HOME = "/usr/lib/disco"

# Root directory for Disco data
DISCO_ROOT = "/usr/var/disco"

# Where the master's web docroot lives
DISCO_WWW_ROOT = "/usr/share/disco/master/www"

# HTTP server for master and nodes runs on this port
# disco://host URIs are mapped to http://host:DISCO_PORT
DISCO_PORT = 8989

# Example config for Varnish proxy
# DISCO_PROXY_ENABLED = "on"
# DISCO_HTTPD = "/usr/sbin/varnishd -a 0.0.0.0:$DISCO_PROXY_PORT -f $DISCO_PROXY_CONFIG -P $DISCO_PROXY_PID -n/tmp -smalloc"

DDFS_TAG_MIN_REPLICAS = 1
DDFS_TAG_REPLICAS     = 1
DDFS_BLOB_REPLICAS    = 1

DISCO_MASTER_HOST = "singapore-dev-1"

/usr/var/disco/log/console.log:

2014-11-27 23:14:34.478 [debug] <0.63.0>@lager_handler_watcher:94 Lager installed handler error_logger_lager_h into error_logger
2014-11-27 23:14:34.483 [debug] <0.46.0> Supervisor gr_param_sup started gr_param:start_link(gr_lager_default_tracer_params) at pid <0.65.0>
2014-11-27 23:14:34.483 [debug] <0.45.0> Supervisor gr_counter_sup started gr_counter:start_link(gr_lager_default_tracer_counters) at pid <0.66.0>
2014-11-27 23:14:34.484 [debug] <0.47.0> Supervisor gr_manager_sup started gr_manager:start_link(gr_lager_default_tracer_params_mgr, gr_lager_default_tracer_params, []) at pid <0.67.0>
2014-11-27 23:14:34.484 [debug] <0.47.0> Supervisor gr_manager_sup started gr_manager:start_link(gr_lager_default_tracer_counters_mgr, gr_lager_default_tracer_counters, [{input,0},{filter,0},{output,0}]) at pid <0.68.0>
2014-11-27 23:14:34.593 [info] <0.7.0> Application lager started on node 'disco_8989_master@singapore-dev-1'
2014-11-27 23:14:34.603 [debug] <0.73.0> Supervisor inets_sup started ftp_sup:start_link() at pid <0.74.0>
2014-11-27 23:14:34.614 [debug] <0.76.0> Supervisor httpc_profile_sup started httpc_manager:start_link(default, only_session_cookies, inets) at pid <0.77.0>
2014-11-27 23:14:34.614 [debug] <0.75.0> Supervisor httpc_sup started httpc_profile_sup:start_link([{httpc,{default,only_session_cookies}}]) at pid <0.76.0>
2014-11-27 23:14:34.615 [debug] <0.75.0> Supervisor httpc_sup started httpc_handler_sup:start_link() at pid <0.78.0>
2014-11-27 23:14:34.615 [debug] <0.73.0> Supervisor inets_sup started httpc_sup:start_link([{httpc,{default,only_session_cookies}}]) at pid <0.75.0>
2014-11-27 23:14:34.617 [debug] <0.73.0> Supervisor inets_sup started httpd_sup:start_link([]) at pid <0.79.0>
2014-11-27 23:14:34.618 [debug] <0.73.0> Supervisor inets_sup started tftp_sup:start_link([]) at pid <0.80.0>
2014-11-27 23:14:34.618 [info] <0.7.0> Application inets started on node 'disco_8989_master@singapore-dev-1'
2014-11-27 23:14:34.619 [info] <0.81.0>@disco_main:init:52 DISCO BOOTS
2014-11-27 23:14:34.620 [info] <0.81.0>@disco_proxy:start:53 Disco proxy disabled
2014-11-27 23:14:34.621 [info] <0.81.0>@ddfs_master:start_link:59 DDFS master starts
2014-11-27 23:14:34.621 [debug] <0.81.0> Supervisor {<0.81.0>,disco_main} started ddfs_master:start_link() at pid <0.82.0>
2014-11-27 23:14:34.623 [info] <0.81.0>@event_server:start_link:114 Event server starts
2014-11-27 23:14:34.623 [debug] <0.81.0> Supervisor {<0.81.0>,disco_main} started event_server:start_link() at pid <0.88.0>
2014-11-27 23:14:34.624 [info] <0.81.0>@disco_config:start_link:27 Disco config starts
2014-11-27 23:14:34.624 [debug] <0.81.0> Supervisor {<0.81.0>,disco_main} started disco_config:start_link() at pid <0.89.0>
2014-11-27 23:14:34.626 [info] <0.81.0>@disco_server:start_link:62 DISCO SERVER STARTS
2014-11-27 23:14:34.626 [info] <0.90.0>@fair_scheduler:start_link:39 Fair scheduler starts
2014-11-27 23:14:34.627 [info] <0.91.0>@fair_scheduler:init:81 Scheduler uses fair policy
2014-11-27 23:14:34.627 [info] <0.91.0>@fair_scheduler_fair_policy:start_link:37 Fair scheduler: Fair policy
2014-11-27 23:14:34.629 [warning] <0.89.0>@disco_config:terminate:100 Disco config dies: {{badmatch,any},[{mochijson2,tokenize,2,[{file,"src/mochijson2.erl"},{line,574}]},{mochijson2,decode1,2,[{file,"src/mochijson2.erl"},{line,326}]},{mochijson2,json_decode,2,[{file,"src/mochijson2.erl"},{line,321}]},{disco_config,get_full_config,0,[{file,"src/disco_config.erl"},{line,150}]},{disco_config,do_get_config_table,0,[{file,"src/disco_config.erl"},{line,195}]},{disco_config,handle_call,3,[{file,"src/disco_config.erl"},{line,73}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,580}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
2014-11-27 23:14:34.630 [error] <0.89.0> gen_server disco_config terminated with reason: no match of right hand value any in mochijson2:tokenize/2 line 574
2014-11-27 23:14:34.630 [error] <0.89.0> CRASH REPORT Process disco_config with 0 neighbours exited with reason: no match of right hand value any in mochijson2:tokenize/2 line 574 in gen_server:terminate/6 line 737
2014-11-27 23:14:34.630 [warning] <0.81.0>@disco_server:start_link:69 Parsing config failed: exit:{{{badmatch,any},[{mochijson2,tokenize,2,[{file,"src/mochijson2.erl"},{line,574}]},{mochijson2,decode1,2,[{file,"src/mochijson2.erl"},{line,326}]},{mochijson2,json_decode,2,[{file,"src/mochijson2.erl"},{line,321}]},{disco_config,get_full_config,0,[{file,"src/disco_config.erl"},{line,150}]},{disco_config,do_get_config_table,0,[{file,"src/disco_config.erl"},{line,195}]},{disco_config,handle_call,3,[{file,"src/disco_config.erl"},{line,73}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,580}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},{gen_server,call,[disco_config,get_config_table]}}
2014-11-27 23:14:34.630 [warning] <0.90.0>@disco_server:terminate:246 Disco server dies: {shutdown,{failed_to_start_child,disco_server,ok}}
2014-11-27 23:14:34.630 [error] <0.81.0> Supervisor {<0.81.0>,disco_main} had child disco_server started with disco_server:start_link() at undefined exit with reason ok in context start_error
2014-11-27 23:14:34.630 [error] <0.81.0> Supervisor {<0.81.0>,disco_main} had child disco_config started with disco_config:start_link() at <0.89.0> exit with reason no match of right hand value any in mochijson2:tokenize/2 line 574 in context shutdown_error
2014-11-27 23:14:34.630 [error] <0.37.0> CRASH REPORT Process <0.37.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,disco_server,ok}},{disco_main,start,[normal,[]]}} in application_master:init/4 line 133
2014-11-27 23:14:34.630 [info] <0.7.0> Application disco exited with reason: {{shutdown,{failed_to_start_child,disco_server,ok}},{disco_main,start,[normal,[]]}}

/usr/var/disco/log/error.log:

2014-11-27 23:14:34.630 [error] <0.89.0> gen_server disco_config terminated with reason: no match of right hand value any in mochijson2:tokenize/2 line 574
2014-11-27 23:14:34.630 [error] <0.89.0> CRASH REPORT Process disco_config with 0 neighbours exited with reason: no match of right hand value any in mochijson2:tokenize/2 line 574 in gen_server:terminate/6 line 737
2014-11-27 23:14:34.630 [error] <0.81.0> Supervisor {<0.81.0>,disco_main} had child disco_server started with disco_server:start_link() at undefined exit with reason ok in context start_error
2014-11-27 23:14:34.630 [error] <0.81.0> Supervisor {<0.81.0>,disco_main} had child disco_config started with disco_config:start_link() at <0.89.0> exit with reason no match of right hand value any in mochijson2:tokenize/2 line 574 in context shutdown_error
2014-11-27 23:14:34.630 [error] <0.37.0> CRASH REPORT Process <0.37.0> with 0 neighbours exited with reason: {{shutdown,{failed_to_start_child,disco_server,ok}},{disco_main,start,[normal,[]]}} in application_master:init/4 line 133

/usr/var/disco/log/crash.log:

2014-11-27 23:14:34 =ERROR REPORT====
** Generic server disco_config terminating
** Last message in was get_config_table
** When Server state == undefined
** Reason for termination ==
** {{badmatch,any},[{mochijson2,tokenize,2,[{file,"src/mochijson2.erl"},{line,574}]},{mochijson2,decode1,2,[{file,"src/mochijson2.erl"},{line,326}]},{mochijson2,json_decode,2,[{file,"src/mochijson2.erl"},{line,321}]},{disco_config,get_full_config,0,[{file,"src/disco_config.erl"},{line,150}]},{disco_config,do_get_config_table,0,[{file,"src/disco_config.erl"},{line,195}]},{disco_config,handle_call,3,[{file,"src/disco_config.erl"},{line,73}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,580}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
2014-11-27 23:14:34 =CRASH REPORT====
  crasher:
    initial call: disco_config:init/1
    pid: <0.89.0>
    registered_name: disco_config
    exception exit: {{{badmatch,any},[{mochijson2,tokenize,2,[{file,"src/mochijson2.erl"},{line,574}]},{mochijson2,decode1,2,[{file,"src/mochijson2.erl"},{line,326}]},{mochijson2,json_decode,2,[{file,"src/mochijson2.erl"},{line,321}]},{disco_config,get_full_config,0,[{file,"src/disco_config.erl"},{line,150}]},{disco_config,do_get_config_table,0,[{file,"src/disco_config.erl"},{line,195}]},{disco_config,handle_call,3,[{file,"src/disco_config.erl"},{line,73}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,580}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]},[{gen_server,terminate,6,[{file,"gen_server.erl"},{line,737}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
    ancestors: [<0.81.0>,<0.38.0>]
    messages: []
    links: [<0.81.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 2995
  neighbours:
2014-11-27 23:14:34 =SUPERVISOR REPORT====
     Supervisor: {<0.81.0>,disco_main}
     Context:    start_error
     Reason:     ok
     Offender:   [{pid,undefined},{name,disco_server},{mfargs,{disco_server,start_link,[]}},{restart_type,permanent},{shutdown,10},{child_type,worker}]

2014-11-27 23:14:34 =SUPERVISOR REPORT====
     Supervisor: {<0.81.0>,disco_main}
     Context:    shutdown_error
     Reason:     {{badmatch,any},[{mochijson2,tokenize,2,[{file,"src/mochijson2.erl"},{line,574}]},{mochijson2,decode1,2,[{file,"src/mochijson2.erl"},{line,326}]},{mochijson2,json_decode,2,[{file,"src/mochijson2.erl"},{line,321}]},{disco_config,get_full_config,0,[{file,"src/disco_config.erl"},{line,150}]},{disco_config,do_get_config_table,0,[{file,"src/disco_config.erl"},{line,195}]},{disco_config,handle_call,3,[{file,"src/disco_config.erl"},{line,73}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,580}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
     Offender:   [{pid,<0.89.0>},{name,disco_config},{mfargs,{disco_config,start_link,[]}},{restart_type,permanent},{shutdown,10},{child_type,worker}]

2014-11-27 23:14:34 =CRASH REPORT====
  crasher:
    initial call: application_master:init/4
    pid: <0.37.0>
    registered_name: []
    exception exit: {{{shutdown,{failed_to_start_child,disco_server,ok}},{disco_main,start,[normal,[]]}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
    ancestors: [<0.36.0>]
    messages: [{'EXIT',<0.38.0>,normal}]
    links: [<0.36.0>,<0.7.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 114
  neighbours:
PawelTobis commented 9 years ago

I confirm the same behaviour on my CentOS 6.6 with the following software packages:

$ rpm -q erlang
erlang-17.5.3-1.el6.x86_64

$ erl
Erlang/OTP 17 [erts-6.4.1] [source-381fb6c] [64-bit] [smp:24:24] [async-threads:10] [hipe] [kernel-poll:false]

$ cd disco && git describe
0.5-478-g451e96f

I would only like to add that after issuing disco start for the first time, no error log is created and, a described above, only one line of error description is printed to stdout:

$ disco start
zero length field name in format

At this point, a zero-length file /usr/var/disco/disco_8989.config is created.

Further steps and results are the same as described above by @ErikDubbelboer.

Is there any chance to get a fix for this bug?

compevo commented 7 years ago

Same issue in Centos 6.7 I cannot get Disco to start and there doesn't seem to much support or documentation for this project in Centos unfortunately.

Erlang R14B04 (erts-5.8.5) [source] [64-bit] [smp:12:12] [rq:12] [async-threads:0] [kernel-poll:true]

Eshell V5.8.5 (abort with ^G) (disco_8989_master@testing)1> 17:19:26.566 [info] Application lager started on node disco_8989_master@testing 17:19:26.603 [info] Application inets started on node disco_8989_master@testing 17:19:26.626 [info] DISCO BOOTS 17:19:26.628 [info] Disco proxy disabled 17:19:26.631 [info] DDFS master starts 17:19:26.634 [info] Event server starts 17:19:26.636 [info] Disco config starts 17:19:26.638 [info] DISCO SERVER STARTS 17:19:26.640 [info] Fair scheduler starts 17:19:26.640 [info] Scheduler uses fair policy 17:19:26.641 [info] Fair scheduler: Fair policy 17:19:26.644 [warning] Disco config dies: {{badmatch,any},[{mochijson2,tokenize,2},{mochijson2,decode1,2},{mochijson2,json_decode,2},{disco_config,get_full_config,0},{disco_config,do_get_config_table,0},{disco_config,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} 17:19:26.644 [error] gen_server disco_config terminated with reason: no match of right hand value any in mochijson2:tokenize/2 17:19:26.644 [warning] Parsing config failed: exit:{{{badmatch,any},[{mochijson2,tokenize,2},{mochijson2,decode1,2},{mochijson2,json_decode,2},{disco_config,get_full_config,0},{disco_config,do_get_config_table,0},{disco_config,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]},{gen_server,call,[disco_config,get_config_table]}} 17:19:26.644 [warning] Disco server dies: shutdown 17:19:26.644 [error] CRASH REPORT Process disco_config with 0 neighbours exited with reason: no match of right hand value any in mochijson2:tokenize/2 in gen_server:terminate/6 17:19:26.649 [error] Supervisor {<0.85.0>,disco_main} had child disco_server started with disco_server:start_link() at undefined exit with reason ok in context start_error 17:19:26.650 [error] Supervisor {<0.85.0>,disco_main} had child disco_config started with disco_config:start_link() at <0.93.0> exit with reason no match of right hand value any in mochijson2:tokenize/2 in context shutdown_error 17:19:26.650 [info] Application disco exited with reason: {shutdown,{disco_main,start,[normal,[]]}}