chef-boneyard / opscode-pushy-server

Chef Push Jobs Server
https://docs.chef.io/push_jobs.html
Apache License 2.0
16 stars 10 forks source link

pushy-server gives 400 errors with no further information #121

Closed andyrepton closed 7 years ago

andyrepton commented 8 years ago

Hey guys,

I've been trying to debug this for over a day now, so am hoping someone can point me in the right direction.

Scenario New chef delivery installation, installed using rake setup:delivery_cluster

Expected functionality pushy-client can talk to pushy-server and running /opt/opscode-push-jobs-client/bin/pushy-client will connect to the server

Current functionality Running /opt/opscode-push-jobs-client/bin/pushy-client gives the following error:

[root@hmfchef-build01 ~]# /opt/opscode-push-jobs-client/bin/pushy-client
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using node name: build-node-test-1
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using node name: build-node-test-1
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using Chef server: https://85.222.236.184/organizations/test
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using Chef server: https://85.222.236.184/organizations/test
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using private key: /etc/chef/client.pem
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using private key: /etc/chef/client.pem
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using org name: test
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Using org name: test
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Incarnation ID: 132c4c74-fb23-4092-9229-a828ce3eae28
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Incarnation ID: 132c4c74-fb23-4092-9229-a828ce3eae28
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Starting client ...
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Starting client ...
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Retrieving configuration from https://85.222.236.184/organizations/test/pushy/config/build-node-test-1 ...
[2015-12-18T19:11:02+01:00] INFO: [build-node-test-1] Retrieving configuration from https://85.222.236.184/organizations/test/pushy/config/build-node-test-1 ...
[2015-12-18T19:11:02+01:00] INFO: HTTP Request Returned 400 Bad Request: error
[2015-12-18T19:11:02+01:00] INFO: HTTP Request Returned 400 Bad Request: error
/opt/opscode-push-jobs-client/embedded/lib/ruby/1.9.1/net/http.rb:2633:in `error!': 400 "Bad Request" (Net::HTTPServerException)
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.8/lib/chef/http.rb:143:in `request'
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.8/lib/chef/rest.rb:115:in `get'
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/opscode-pushy-client-1.1.3/lib/pushy_client.rb:172:in `get_config'
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/opscode-pushy-client-1.1.3/lib/pushy_client.rb:72:in `start'
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/opscode-pushy-client-1.1.3/lib/pushy_client/cli.rb:114:in `run_application'
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/chef-11.12.8/lib/chef/application.rb:67:in `run'
    from /opt/opscode-push-jobs-client/embedded/lib/ruby/gems/1.9.1/gems/opscode-pushy-client-1.1.3/bin/pushy-client:8:in `<top (required)>'
    from /opt/opscode-push-jobs-client/bin/pushy-client:23:in `load'
    from /opt/opscode-push-jobs-client/bin/pushy-client:23:in `<main>'

On the chef server, I only get the following error:

10.0.0.1 - - [18/Dec/2015:19:14:12 +0100]  "GET /organizations/test/pushy/config/build-node-test-3 HTTP/1.1" 400 "0.002" 26 "-" "Chef Client/11.12.8 (ruby-1.9.3-p547; ohai-7.0.4; x86_64-linux; +http://opscode.com)" "85.222.236.184:10003" "400" "0.002" "11.12.8" "algorithm=sha1;version=1.0;" "build-node-test-3" "2015-12-18T18:14:14Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 1035

There is a single error in /var/log/opscode/opscode-pushy-server/console.log:

2015-12-18 18:41:40.110 [error] gen_event log_mf_h installed in error_logger terminated with reason: no function clause matching log_mf_h:handle_call(get_loglevel, {state,"/var/log/opscode/opscode-pushy-server/sasl",104857600,5,4892,2,{file_descriptor,prim_file,...},...}) line 164
2015-12-18 18:41:40.115 [error] gen_event sasl_report_file_h installed in error_logger terminated with reason: {error,bad_query}

And in crash.log:

2015-12-18 18:41:40 =ERROR REPORT====
** gen_event handler log_mf_h crashed.
** Was installed in error_logger
** Last event was: get_loglevel
** When handler state == {state,"/var/log/opscode/opscode-pushy-server/sasl",104857600,5,4892,2,{file_descriptor,prim_file,{#Port<0.635>,12}},[],#Fun<sasl.0.89613933>}
** Reason == {function_clause,[{log_mf_h,handle_call,[get_loglevel,{state,"/var/log/opscode/opscode-pushy-server/sasl",104857600,5,4892,2,{file_descriptor,prim_file,{#Port<0.635>,12}},[],#Fun<sasl.0.89613933>}],[{file,"log_mf_h.erl"},{line,164}]},{gen_event,server_call_update,3,[{file,"gen_event.erl"},{line,638}]},{gen_event,server_call,4,[{file,"gen_event.erl"},{line,587}]},{gen_event,handle_msg,5,[{file,"gen_event.erl"},{line,258}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
2015-12-18 18:41:40 =ERROR REPORT====
** gen_event handler sasl_report_file_h crashed.
** Was installed in error_logger
** Last event was: get_loglevel
** When handler state == {<0.38.0>,"/var/log/opscode/opscode-pushy-server/sasl-error.log",error}
** Reason == {error,bad_query}

Things I have tried Rebuilt all nodes, manually tried a wget connection (confirmed the 400 error persists on both ports 443 and 10003 on the chef server). Restarted services, confirmed firewalls are not blocking ports, checked nginx configuration for obvious errors, lots and lots of googling and looking in github errors.

Any ideas? Thanks in advance. I couldn't find a forum or IRC channel that looked right.

jonsmorrow commented 8 years ago

Hi Seth!

Thank you for the thorough logs and description of your problem. We'll get your install sorted out but will likely need some more information and maybe to jump on-line with you.

I haven't been able to recreate your problem locally, but one thing that stands out to me is that it looks like your build nodes still have 'opscode-push-jobs-client' version 1.1.3 installed on the builders. This package was renamed recently to just 'push-jobs-client' and is currently at version 1.3.4. It's a bit of a guess but I am suspecting the older version is sending an unrecognized request where something in the headers is mismatched.

Can you please describe how your nodes were created? Are you using an internal package repository? Did you use delivery-cluster? If so Are you specifying a specific push client package in the env file?

One thing to try is installing the newer version of push-jobs-client and seeing if the error goes away. We'll still need to figure out why your install is pulling in the older version but it might get you unblocked. If it doesn't fix it then we will have to dig a little deeper. You can find the newest version of push-jobs-client in our package-cloud repository here: https://packagecloud.io/chef/stable. I couldn't determine your os from the bits you posted so couldn't provide a direct link.

Thanks!

andyrepton commented 8 years ago

Hi Jon!

The issue seems to only occur in CentOS7, 6.7 seems to work fine. The whole thing was created using the delivery-cluster tutorial here: https://learn.chef.io/build-a-delivery-pipeline/ I'm going to try and get the fog provisioner running in it, and then I'll rebuild the nodes (doing it via ssh is a pain) with CentOS 7 so we can debug further.

On CentOS 7 I had to manually install the CentOS 6 package as one for 7 doesn't exist yet. I posted an issue about it here: https://github.com/chef/chef-push/issues/12 . I installed the version found here: https://downloads.chef.io/push-jobs-server/redhat/ . Let me rebuild the CentOS 7 nodes and try using the newer client.

Thanks for your help!

Andy

charlesjohnson commented 7 years ago

As there have been no updates to this issue in 220 days, and new releases of the push-jobs server and client, I'm going to close this ticket as stale.

@Seth-Karlo if this issue is still valid, please re-open, we'll be happy to assist further.