Open yinym opened 9 years ago
Just a quick update, we are now on RHEL 7.1 and still seeing this issue.
Have you had any resolution to this? I am in the process of tracking down similar issues with doing a knife upload to chef. We resolved our issues by moving from standard EBS volumes to gp2 volumes. One thing I did notice, is we aren't getting oc_bifrost requests.log being written to.
I saw this issue recently. There was a bunch of errors across the chef services that all point to timeouts.
Note, I've removed parts of these errors to help legibility.
# In opscode-erchef/current
[error] Error setting ACE {authz_ace,[...]} for method delete on object ... for requestor ...: req_timedout
# in nginx/error.log
[error] [lua] config.lua:60: connect_redis(): failed to authenticate to redis: timeout, ...
# in oc_bifrost/crash.log
{<<"method=GET; path=...; status=500; ">>,{error,{error,{case_clause,{error,timeout}},[{bifrost_wm_acl_member_resource,to_json ...
It turns our of Chef servers (running in tiered setup) were running on highly contended hypervisors.
The redis-cli tool helped observe this problem.
# /opt/opscode/embedded/bin/redis-cli --intrinsic-latency 200
...
Max latency so far: 2450172 microseconds
2.4 seconds yikes!
Upping various timeout options resolved the problem for us until we could move to a less contended environment.
# /etc/opscode/chef-server.rb
opscode_erchef['authz_timeout'] = 5000
oc_chef_authz['ibrowse_options'] = '[{connect_timeout, 5000}]'
lb['redis_connection_timeout'] = 5000
lb['redis_keepalive_timeout'] = 5000
I've been running into relatively frequently, as part of an everything-in-ci project. It seems like the fix should be "internal timeouts should be reported to the user as 500 errors, not 403 errors, which is a confusing lie".
I know this is old but we're seeing the same thing.
TODO: Start with looking at the sandbox error handling code. Make sure the expected error response is sent to the user.
Version
chef-server-core-12.0.5-1.el6.x86_64 chef-12.0.3-1.el6.x86_64
Environment
RHEL 6.6 x86_64
Summary
After I successfully installed and configured the new Chef server 12, I tried to upload cookbooks in batch to this Chef server, but got intermittent uploading issue.
This is obviously an issue, because I have successfully uploaded several cookbooks. And also, this appears intermittently. Sometimes, you failed at the third cookbook, and sometimes, you failed at the last one. Retrying to upload several times, all the cookbooks can be uploaded successfully.
Details
I tried to dig into the log files, and noticed following: In
/var/log/opscode/nginx/access.log
, at the failing time, it reported:At the same time, in
/var/log/opscode/opscode-erchef/current
:In the
Error setting ACE ...
log, the request actually timedout:req_timedout
By checking oc_bifrost log
/var/log/opscode/oc_bifrost/requests.log.1
for ACL processing, an interesting log showed up:Please notice the req_time and rdbms.bifrost_db.update_acl_time in the second line of log, they are extremely longer than others (thousands of times). This probably caused this issue. DB operation is too long and the erchef thought it was timedout.
This issue does not happen on all the systems, not sure if it is related to environment configuration.