Closed kdaily closed 8 years ago
What scheduler is being used? The age of RHEL/CentOS 6 is becoming a problem in general.
SGE.
I can probably go to CentOS 7, but my colleague noted that there were some dependency install differences from CentOS 6, so will have to debug!
Just launched a cluster and not seeing any errors reported.
[root@ip-192-168-1-28 ~]# cat /etc/system-release
CentOS release 6.7 (Final)
[root@ip-192-168-1-28 ~]# cat /opt/cfncluster/.bootstrapped
cfncluster-1.2.1
[root@ip-192-168-1-28 ~]# mailx
No mail for root
[root@ip-192-168-1-28 ~]# tail /var/log/cron
Mar 30 17:38:01 ip-192-168-0-202 CROND[7997]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:39:01 ip-192-168-0-202 CROND[8075]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:40:01 ip-192-168-0-202 CROND[8154]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:40:01 ip-192-168-0-202 CROND[8155]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Mar 30 17:41:01 ip-192-168-0-202 CROND[8235]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:42:01 ip-192-168-0-202 CROND[8331]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:43:01 ip-192-168-0-202 CROND[8409]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:44:01 ip-192-168-0-202 CROND[8675]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:45:01 ip-192-168-0-202 CROND[8754]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
Mar 30 17:46:01 ip-192-168-0-202 CROND[8835]: (root) CMD (/opt/cfncluster/scripts/publish_pending)
[root@ip-192-168-1-28 ~]# qhost
HOSTNAME ARCH NCPU NSOC NCOR NTHR LOAD MEMTOT MEMUSE SWAPTO SWAPUS
----------------------------------------------------------------------------------------------
global - - - - - - - - - -
ip-192-168-1-204 lx-amd64 2 1 1 2 0.01 3.7G 128.5M 0.0 0.0
[root@ip-192-168-1-28 ~]#
Can you please post a copy of /opt/cfncluster/scripts/publish_pending
and also as root, the following commands on the MasterServer.
cat /opt/cfncluster/.bootstrapped
which aws
aws --version
which python
python -V
/opt/cfncluster/scripts/publish_pending
#!/bin/bash
# Copyright 2013-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#
# Licensed under the Amazon Software License (the "License"). You may not use this file except in compliance with the
# License. A copy of the License is located at
#
# http://aws.amazon.com/asl/
#
# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
# limitations under the License.
PATH=/bin:/usr/bin:/usr/local/bin
export PATH
. /etc/cfncluster/cfnconfig
. /opt/sge/default/common/settings.sh
pending=$(qstat -g d -s p -u '*' | tail -n+3 | awk '{total = total+ $8}END{print total}')
if [ "${pending}x" == "x" ]; then
pending=0
fi
aws --region ${cfn_region} cloudwatch put-metric-data --namespace cfncluster --metric-name pending --unit Count --value ${pending} --dimensions Stack=${stack_name}
# cat /opt/cfncluster/.bootstrapped
cfncluster-1.1.0
# which aws
/usr/bin/aws
# aws --version
/usr/lib64/python2.6/site-packages/cryptography/__init__.py:26: DeprecationWarning: Python 2.6 is no longer supported by the Python core team, please upgrade your Python. A future version of cryptography will drop support for Python 2.6
DeprecationWarning
aws-cli/1.10.6 Python/2.6.6 Linux/2.6.32-573.18.1.el6.x86_64 botocore/1.3.28
# which python
/usr/bin/python
# python -V
Python 2.6.6
Checking a test instance, that python package is not installed. Have you added additional python packages to the install? I suspect that is what is causing the error. Also the error is coming from AWSCLI, so once you have found the offending python package, you might want to report it here: https://github.com/aws/aws-cli
[centos@ip-192-168-1-28 ~]$ file /usr/lib64/python2.6/site-packages/cryptography/__init__.py
/usr/lib64/python2.6/site-packages/cryptography/__init__.py: cannot open `/usr/lib64/python2.6/site-packages/cryptography/__init__.py' (No such file or directory)
[centos@ip-192-168-1-28 ~]$
I indeed have. Rebuilding now removing some dependencies. It could also have come inadvertently from a yum update -y
.
Getting mails to the root account on the master server (truncated):
Get lots of mails since this is run every minute by cron.
Using the CentOS 6 image.