chef-boneyard / opscode-pushy-client

Client API for Pushy
Apache License 2.0
12 stars 14 forks source link

Push-jobs-client v2.5.0.1 cannot start on AIX7 #162

Closed dai624fuji closed 5 years ago

dai624fuji commented 6 years ago

Description

Push-jobs-client v2.5.0.1 cannot start on AIX7. Although pus-jobs-client v 2.4.8 started normaly on AIX7.

Version

Server

Red Hat Enterprise Linux Server 7 Chef Server Version 12.17.33 Push Jobs Server Version 2.2.8

Client

AIX 7 Chef Client Version 12.22.3 Push Jobs Client Version 2.5.0.1

logs

[root@kkmdcpbt01:/]$ startsrc -s push-jobs-client 0513-059 The push-jobs-client Subsystem has been started. Subsystem PID is 18743552. [root@kkmdcpbt01:/]$ [root@kkmdcpbt01:/]$ tail -f /var/log/chef/push-jobs-client.log ~~ INFO: [KKMDCPBT01] Forced GC; Stat count changed 1 INFO: [KKMDCPBT01] Starting command / server heartbeat receive thread ... INFO: received_command: false, @client.legacy_mode:false INFO: [KKMDCPBT01] Received server heartbeat (sequence #634) logging 0/3 INFO: received_command: false, @client.legacy_mode:false INFO: received_command: false, @client.legacy_mode:false INFO: received_command: false, @client.legacy_mode:false ERROR: [KKMDCPBT01] No messages being received on command port in 4s. Possible encryption problem? INFO: [KKMDCPBT01] Reconfiguring client / reloading keys ... INFO: [KKMDCPBT01] Retrieving configuration from https://chef13.stctest/organizations/renewal2020//pushy/config/KKMDCPBT01: ... INFO: [KKMDCPBT01] Stopping command / server heartbeat receive thread and destroying sockets ... INFO: [KKMDCPBT01] Resolved chef13.stctest to 'xxx.xxx.xxx.xxx' and 0 others INFO: [KKMDCPBT01] Starting ZMQ version [4, 2, 2] INFO: [KKMDCPBT01] Listening for server heartbeat at tcp://chef13.stctest:10000 INFO: [KKMDCPBT01] Connecting to command channel at tcp://chef13.stctest:10002 INFO: [KKMDCPBT01] Stopping heartbeat / offline detection thread ... INFO: [KKMDCPBT01] Starting command / server heartbeat receive thread ... INFO: [KKMDCPBT01] Considering server online, and starting to heartbeat INFO: [KKMDCPBT01] Stopping reconfigure thread ... INFO: [KKMDCPBT01] Starting heartbeat / offline detection thread on interval 10.0 ... ERROR: [KKMDCPBT01] Error in heartbeat / offline detection thread: uninitialized constant ZMQ::DONTWAIT /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:504:in send_signed_message' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:477:inblock in send_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:in synchronize' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:insend_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:229:in send_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client.rb:178:insend_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/heartbeater.rb:80:in block in start' ERROR: [KKMDCPBT01] Error in heartbeat / offline detection thread: uninitialized constant ZMQ::DONTWAIT /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:504:insend_signed_message' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:477:in block in send_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:insynchronize' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:in send_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:229:insend_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client.rb:178:in send_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/heartbeater.rb:80:inblock in start' ERROR: [KKMDCPBT01] Error in heartbeat / offline detection thread: uninitialized constant ZMQ::DONTWAIT /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:504:in send_signed_message' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:477:inblock in send_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:in synchronize' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:insend_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:229:in send_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client.rb:178:insend_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/heartbeater.rb:80:in block in start' INFO: received_command: false, @client.legacy_mode:false ERROR: [KKMDCPBT01] Error in heartbeat / offline detection thread: uninitialized constant ZMQ::DONTWAIT /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:504:insend_signed_message' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:477:in block in send_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:insynchronize' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:472:in send_signed_json_command' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb:229:insend_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client.rb:178:in send_heartbeat' /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/heartbeater.rb:80:inblock in start' INFO: [KKMDCPBT01] Reconfigured client. INFO: [KKMDCPBT01] Starting reconfigure thread. Will reconfigure / reload keys after 3600 seconds, less up to splay 0.1. INFO: [KKMDCPBT01] Setting reconfigure deadline to 2018-09-16 12:07:14 +0900 INFO: [KKMDCPBT01] Server has missed 4 heartbeats in a row. Considering it offline, and stopping heartbeat. INFO: [KKMDCPBT01] Considering server online, and starting to heartbeat INFO: [KKMDCPBT01] Closing and reopening sockets since server is down ... INFO: [KKMDCPBT01] Stopping command / server heartbeat receive thread and destroying sockets ... INFO: [KKMDCPBT01] Resolved chef13.stctest to 'xxx.xxx.xxx.xxx' and 0 others INFO: [KKMDCPBT01] Starting ZMQ version [4, 2, 2] INFO: [KKMDCPBT01] Listening for server heartbeat at tcp://chef13.stctest:10000 INFO: [KKMDCPBT01] Connecting to command channel at tcp://chef13.stctest:10002 INFO: [KKMDCPBT01] Done closing and reopening sockets. INFO: [KKMDCPBT01] Starting command / server heartbeat receive thread ... ~~ [root@kkmdcpbt01:/]$

dai624fuji commented 6 years ago

$ diff /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb.new /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/opscode-pushy-client-2.5.0/lib/pushy_client/protocol_handler.rb 302a303

Chef::Log.info "received_command: #{received_command}, @client.legacy_mode:#{@client.legacy_mode}" 490,506c491,497 < # https://github.com/chuckremes/ffi-rzmq/blob/master/lib/ffi-rzmq/socket.rb < # send_string < # < # +flags+ may be ZMQ::DONTWAIT and ZMQ::SNDMORE. < # < # Returns 0 when the message was successfully enqueued. < # Returns -1 under two conditions. < # 1. The message could not be enqueued < # 2. When +flags+ is set with ZMQ::DONTWAIT and the socket returned EAGAIN. < # < # With a -1 return code, the user must check ZMQ::Util.errno to determine the < # cause. < < socket.send_string(auth, ZMQ::SNDMORE | ZMQ::DONTWAIT) < rc = socket.send_string(message, ZMQ::DONTWAIT) < if rc == -1 < Chef::Log.info("[#{client.node_name}] ZMQ socket enqueue error #{ZMQ::Util.errno}. Triggering reconfigure")

begin Timeout.timeout(10) do socket.send_string(auth, ZMQ::SNDMORE) socket.send_string(message) end rescue Timeout::Error Chef::Log.info("[#{client.node_name}] ZMQ socket timed out. Triggering reconfigure") $

r-goto commented 6 years ago

@jeremymv2 @markan

ERROR: [KKMDCPBT01] Error in heartbeat / offline detection thread: uninitialized constant ZMQ::DONTWAIT

Looking at the above ERROR log, this issue may be caused by not taking ZMQ::DONTWAIT of rbzmq/zmq into consideration(?). Only AIX uses rbzmq/zmq so the issue can be seen only in case of AIX.

https://github.com/chef/opscode-pushy-client/pull/149/files

if RUBY_PLATFORM =~ /aix/ 
require 'rbzmq/zmq' 
else 
require 'ffi-rzmq' 
require 'ffi-rzmq-core' 
end 

And the suspicious change may be merged in as PR152 below.

https://github.com/chef/opscode-pushy-client/pull/152

Please have a look at the issue and, if applicable, please release a fix.

Thanks,

dai624fuji commented 6 years ago

I confirmed that the error is solved with the following correction.

1. edit rbzmq.c

$ tail -n 5 /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/rbzmq-3.0.0/ext/zmq/rbzmq.c
    rb_define_const (zmq_module, "CURVE_SERVERKEY", INT2NUM (ZMQ_CURVE_SERVERKEY));
    rb_define_const (zmq_module, "CURVE_PUBLICKEY", INT2NUM (ZMQ_CURVE_PUBLICKEY));
    rb_define_const (zmq_module, "CURVE_SECRETKEY", INT2NUM (ZMQ_CURVE_SECRETKEY));
    rb_define_const (zmq_module, "DONTWAIT", INT2NUM (ZMQ_DONTWAIT));                    ## add this line
}
$

2. Compiled and replaced zmq.so.

$ cd /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/rbzmq-3.0.0/ext/zmq/
$ make 
$ mv zmq.so /opt/push-jobs-client/embedded/lib/ruby/gems/2.4.0/gems/rbzmq-3.0.0/lib/.

I want to git pull the code fix, but I did not know where this code is. Is chef Inc managing it? If so, would you please verify the validity of this fix?

README.rdoc

= Background for this fork

Normally building push-jobs-client[https://github.com/chef/opscode-pushy-client] depends on using ffi-rzmq[https://github.com/chuckremes/ffi-rzmq] but to get FFI building on AIX was more work than we could commit to. To meet a customer need we decided to build a cu stom C extension for the parts of ZeroMQ that we need.

We actually have a fairly small use case for ZMQ in Push Jobs Client. There are two sockets, one subscriber and one dealer. The dealer needs to send encrypted information to the server. That is basically it.

This small footprint led us to creating this repo which is a fork of an old C native extension of LibZMQ https://github.com/jtobin/rbzmq and modifying it to fit our needs. This meant ripping out any unused code and updating only the methods we need (like context.socket) to support later versions of Ruby and LibZMQ.

~~

btm commented 5 years ago

The fork of that library used in our AIX build of the push jobs client is here: https://github.com/chef/rbzmq/blob/master/ext/zmq/rbzmq.c#L1008

markan commented 5 years ago

That ZMQ_DONTWAIT above looks pretty reasonable

btm commented 5 years ago

https://github.com/chef/rbzmq/pull/5 has been merged. Omnibus pulls in the latest master via github for rbzmq so the next build will pull in that change.

https://github.com/chef/omnibus-software/blob/master/config/software/rbzmq.rb#L18

dai624fuji commented 5 years ago

I got push-jobs-client v2.5.6 and I confirmed that the problem has been resolved. download command: "mixlib-install download push-jobs-client -c current -p aix -l 7.1 -a powerpc"

We appreciate your quick response!

dai624fuji commented 5 years ago

@jeremymv2 @markan Hi, Chef team. Can I use the current push-jobs-client-2.5.6-1 in my project? If it is under development, we will wait for a new release.