Closed miker2746 closed 6 years ago
Michael - Given the two symptoms, I wonder if the cfncluster-cookbooks failed to run successfully. Can you share your cfncluster config file, /var/log/cloud-init.log
, and /var/log/cfn-init.log
files?
Hi Rajachan,
/var/log/cloud-init.log
file of my master node.2018-01-08 07:47:05,599 - util.py[DEBUG]: Cloud-init v. 17.1 running 'init-local' at Mon, 08 Jan 2018 07:47:05 +0000. Up 27.54 seconds.
2018-01-08 07:47:05,599 - main.py[DEBUG]: No kernel command line url found.
2018-01-08 07:47:05,599 - main.py[DEBUG]: Closing stdin.
2018-01-08 07:47:05,601 - util.py[DEBUG]: Writing to /var/log/cloud-init.log - ab: [644] 0 bytes
2018-01-08 07:47:05,602 - util.py[DEBUG]: Changing the ownership of /var/log/cloud-init.log to 104:4
2018-01-08 07:47:05,602 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance/boot-finished
2018-01-08 07:47:05,602 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/data/no-net
2018-01-08 07:47:05,603 - handlers.py[DEBUG]: start: init-local/check-cache: attempting to read from cache [check]
2018-01-08 07:47:05,603 - util.py[DEBUG]: Reading from /var/lib/cloud/instance/obj.pkl (quiet=False)
2018-01-08 07:47:05,603 - stages.py[DEBUG]: no cache found
2018-01-08 07:47:05,603 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: no cache found
2018-01-08 07:47:05,603 - util.py[DEBUG]: Attempting to remove /var/lib/cloud/instance
2018-01-08 07:47:05,606 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.ubuntu.Distro'>
I can't open the /var/log/cfn-init.log
file. Every time I open this file the instance froze and I have to re-connect to the instance.
the configure file was as follows. I only set the keyname
and the vpc_setting
and the rest setting were just the default settings.
[cluster default]
# Name of an existing EC2 KeyPair to enable SSH access to the instances.
key_name = cfncluster-keypair1
# Override path to cloudformation in S3
# (defaults to https://s3.amazonaws.com/cfncluster-<aws_region_name>/templates/cfncluster-<version>.cfn.json)
#template_url = https://s3.amazonaws.com/cfncluster-us-east-1/templates/cfncluster.cfn.json
# Cluster Server EC2 instance type
# (defaults to t2.micro for default template)
#compute_instance_type = t2.micro
# Master Server EC2 instance type
# (defaults to t2.micro for default template
#master_instance_type = t2.micro
# Inital number of EC2 instances to launch as compute nodes in the cluster.
# (defaults to 2 for default template)
#initial_queue_size = 0
# Maximum number of EC2 instances that can be launched in the cluster.
# (defaults to 10 for the default template)
#max_queue_size = 1
# Boolean flag to set autoscaling group to maintain initial size and scale back
# (defaults to false for the default template)
#maintain_initial_size = false
# Cluster scheduler
# (defaults to sge for the default template)
#scheduler = sge
# Type of cluster to launch i.e. ondemand or spot
# (defaults to ondemand for the default template)
#cluster_type = ondemand
# Spot price for the ComputeFleet
#spot_price = 0.00
# ID of a Custom AMI, to use instead of published AMI's
#custom_ami = ami-9802b1e1
#custom_ami = ami-ff8d1886
#custom_ami = ami-62fa6e1b
#custom_ami = ami-898b1ff0
vpc_settings = mycluster1-vpc
and my vpc_setting is as follows.
[vpc mycluster1-vpc] master_subnet_id = subnet-4ba6f52c vpc_id = vpc-e5446782
hi,
I updated the cfncluster and found out what I was wrong. I should use ssh private ID number
to connect to the worker nodes, or I should write them to the /etc/hosts file with some custom code names.
Thank you for answering my question.
best regards, Michael
Michael - You need not configure anything manually to SSH from the master into the compute nodes. The Chef cookbook already does the heavy-lifting for you. It is hard to say what exactly happened without looking at the cfn-init log. I don't see a correlation between opening the log file and your instance freezing up; it might have been something transient. See if you can at least get the last couple lines using tail (tail -n 100 /var/log/cfn-init.log
); that will be really useful in understanding the problem.
Hi rajachan,
I launched a new cluster, the NFS still failed to set up, here is the /var/log/cfn-init.log file of the new cluster.
`2018-01-09 23:08:18,966 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.eu-west-1.amazonaws.com 2018-01-09 23:08:18,966 [DEBUG] Describing resource MasterServer in stack cfncluster-mycluster2 2018-01-09 23:08:19,081 [INFO] -----------------------Starting build----------------------- 2018-01-09 23:08:19,082 [DEBUG] Not setting a reboot trigger as scheduling support is not available 2018-01-09 23:08:19,083 [INFO] Running configSets: default 2018-01-09 23:08:19,084 [INFO] Running configSet default 2018-01-09 23:08:19,085 [INFO] Running config deployConfigFiles 2018-01-09 23:08:19,086 [DEBUG] No packages specified 2018-01-09 23:08:19,086 [DEBUG] No groups specified 2018-01-09 23:08:19,086 [DEBUG] No users specified 2018-01-09 23:08:19,086 [DEBUG] No sources specified 2018-01-09 23:08:19,086 [DEBUG] Writing content to /etc/chef/client.rb 2018-01-09 23:08:19,086 [DEBUG] Setting mode for /etc/chef/client.rb to 000644 2018-01-09 23:08:19,087 [DEBUG] Setting owner 0 and group 0 for /etc/chef/client.rb 2018-01-09 23:08:19,087 [DEBUG] Writing content to /tmp/dna.json 2018-01-09 23:08:19,087 [DEBUG] Content will be serialized as a JSON structure 2018-01-09 23:08:19,087 [DEBUG] Setting mode for /tmp/dna.json to 000644 2018-01-09 23:08:19,087 [DEBUG] Setting owner 0 and group 0 for /tmp/dna.json 2018-01-09 23:08:19,087 [DEBUG] Writing content to /tmp/extra.json 2018-01-09 23:08:19,087 [DEBUG] Setting mode for /tmp/extra.json to 000644 2018-01-09 23:08:19,088 [DEBUG] Setting owner 0 and group 0 for /tmp/extra.json 2018-01-09 23:08:19,088 [DEBUG] Running command jq 2018-01-09 23:08:19,088 [DEBUG] No test for command jq 2018-01-09 23:08:19,096 [INFO] Command jq succeeded 2018-01-09 23:08:19,096 [DEBUG] Command jq output: 2018-01-09 23:08:19,096 [DEBUG] Running command mkdir 2018-01-09 23:08:19,097 [DEBUG] No test for command mkdir 2018-01-09 23:08:19,099 [INFO] Command mkdir succeeded 2018-01-09 23:08:19,099 [DEBUG] Command mkdir output: 2018-01-09 23:08:19,100 [DEBUG] Running command touch 2018-01-09 23:08:19,100 [DEBUG] No test for command touch 2018-01-09 23:08:19,102 [INFO] Command touch succeeded 2018-01-09 23:08:19,102 [DEBUG] Command touch output: 2018-01-09 23:08:19,102 [DEBUG] No services specified 2018-01-09 23:08:19,104 [INFO] Running config getCookbooks 2018-01-09 23:08:19,105 [DEBUG] No packages specified 2018-01-09 23:08:19,105 [DEBUG] No groups specified 2018-01-09 23:08:19,105 [DEBUG] No users specified 2018-01-09 23:08:19,105 [DEBUG] No sources specified 2018-01-09 23:08:19,105 [DEBUG] No files specified 2018-01-09 23:08:19,105 [DEBUG] Running command berk 2018-01-09 23:08:19,105 [DEBUG] No test for command berk 2018-01-09 23:08:52,045 [INFO] Command berk succeeded 2018-01-09 23:08:52,045 [DEBUG] Command berk output: Resolving cookbook dependencies... Fetching 'cfncluster' from source at . Fetching cookbook index from https://supermarket.getchef.com... Installing apt (6.1.4) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing build-essential (8.0.4) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Using cfncluster (1.4.0) from source at . Installing compat_resource (12.19.0) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing hostname (0.4.2) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing hostsfile (3.0.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing iptables (4.3.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing line (0.6.3) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing mingw (2.0.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing ohai (5.2.0) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing openssh (2.4.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing poise (2.8.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing poise-archive (1.5.0) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing poise-languages (2.1.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing poise-python (1.6.0) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing seven_zip (2.0.2) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing sysctl (0.10.2) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing tar (2.0.0) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing windows (3.4.3) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing yum (5.0.1) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Installing yum-epel (2.1.2) from https://supermarket.getchef.com/ ([opscode] https://supermarket.chef.io:443/api/v1) Vendoring apt (6.1.4) to /etc/chef/cookbooks/apt Vendoring build-essential (8.0.4) to /etc/chef/cookbooks/build-essential Vendoring cfncluster (1.4.0) to /etc/chef/cookbooks/cfncluster Vendoring compat_resource (12.19.0) to /etc/chef/cookbooks/compat_resource Vendoring hostname (0.4.2) to /etc/chef/cookbooks/hostname Vendoring hostsfile (3.0.1) to /etc/chef/cookbooks/hostsfile Vendoring iptables (4.3.1) to /etc/chef/cookbooks/iptables Vendoring line (0.6.3) to /etc/chef/cookbooks/line Vendoring mingw (2.0.1) to /etc/chef/cookbooks/mingw Vendoring ohai (5.2.0) to /etc/chef/cookbooks/ohai Vendoring openssh (2.4.1) to /etc/chef/cookbooks/openssh Vendoring poise (2.8.1) to /etc/chef/cookbooks/poise Vendoring poise-archive (1.5.0) to /etc/chef/cookbooks/poise-archive Vendoring poise-languages (2.1.1) to /etc/chef/cookbooks/poise-languages Vendoring poise-python (1.6.0) to /etc/chef/cookbooks/poise-python Vendoring seven_zip (2.0.2) to /etc/chef/cookbooks/seven_zip Vendoring sysctl (0.10.2) to /etc/chef/cookbooks/sysctl Vendoring tar (2.0.0) to /etc/chef/cookbooks/tar Vendoring windows (3.4.3) to /etc/chef/cookbooks/windows Vendoring yum (5.0.1) to /etc/chef/cookbooks/yum Vendoring yum-epel (2.1.2) to /etc/chef/cookbooks/yum-epel
2018-01-09 23:08:52,046 [DEBUG] No services specified 2018-01-09 23:08:52,048 [INFO] Running config chefPrepEnv 2018-01-09 23:08:52,048 [DEBUG] No packages specified 2018-01-09 23:08:52,048 [DEBUG] No groups specified 2018-01-09 23:08:52,048 [DEBUG] No users specified 2018-01-09 23:08:52,048 [DEBUG] No sources specified 2018-01-09 23:08:52,048 [DEBUG] No files specified 2018-01-09 23:08:52,048 [DEBUG] Running command chef 2018-01-09 23:08:52,048 [DEBUG] No test for command chef 2018-01-09 23:08:58,022 [INFO] Command chef succeeded 2018-01-09 23:08:58,022 [DEBUG] Command chef output: [2018-01-09T23:08:53+00:00] INFO: Forking chef instance to converge... Starting Chef Client, version 12.19.36 [2018-01-09T23:08:53+00:00] INFO: Chef 12.19.36 [2018-01-09T23:08:53+00:00] INFO: Platform: x86_64-linux [2018-01-09T23:08:53+00:00] INFO: Chef-client pid: 2046 [2018-01-09T23:08:55+00:00] INFO: HTTP Request Returned 404 Not Found: Object not found: chefzero://localhost:8889/nodes/ip-10-0-0-68.eu-west-1.compute.internal [2018-01-09T23:08:55+00:00] INFO: Setting the run_list to recipe[cfncluster::sge_config] from CLI options [2018-01-09T23:08:55+00:00] WARN: Run List override has been provided. [2018-01-09T23:08:55+00:00] WARN: Original Run List: [recipe[cfncluster::sge_config]] [2018-01-09T23:08:55+00:00] WARN: Overridden Run List: [recipe[cfncluster::_prep_env]] [2018-01-09T23:08:55+00:00] INFO: Run List is [recipe[cfncluster::_prep_env]] [2018-01-09T23:08:55+00:00] INFO: Run List expands to [cfncluster::_prep_env] [2018-01-09T23:08:55+00:00] INFO: Starting Chef Run for ip-10-0-0-68.eu-west-1.compute.internal [2018-01-09T23:08:55+00:00] INFO: Running start handlers [2018-01-09T23:08:55+00:00] INFO: Start handlers complete. [2018-01-09T23:08:55+00:00] INFO: HTTP Request Returned 404 Not Found: Object not found: resolving cookbooks for run list: ["cfncluster::_prep_env"] [2018-01-09T23:08:56+00:00] INFO: Loading cookbooks [cfncluster@1.4.0, build-essential@8.0.4, poise-python@1.6.0, tar@2.0.0, selinux@2.0.3, nfs@2.4.1, sysctl@0.10.2, yum@5.0.1, yum-epel@2.1.2, openssh@2.4.1, apt@6.1.4, hostname@0.4.2, line@0.6.3, seven_zip@2.0.2, mingw@2.0.1, poise@2.8.1, poise-languages@2.1.1, ohai@5.2.0, compat_resource@12.19.0, iptables@4.3.1, hostsfile@3.0.1, windows@3.4.3, poise-archive@1.5.0] [2018-01-09T23:08:56+00:00] INFO: Skipping removal of obsoleted cookbooks from the cache Synchronizing Cookbooks: [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_compute_base_config.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_compute_custom_config.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_compute_sge_config.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_compute_slurm_config.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_compute_torque_config.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_ganglia_install.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_master_base_config.rb in the cache. [2018-01-09T23:08:56+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_master_custom_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_master_slurm_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_master_torque_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_nvidia_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_setup_python.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_undo_base_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_undo_master_base_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_update_packages.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/base_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_ec2_udev_rules.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/base_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_master_sge_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/custom_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/default.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/image_prep.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/sge_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/sge_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/_prep_env.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/slurm_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/torque_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/libraries/helpers.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/custom_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/amazon/supervisord-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/centos-7/ganglia-webfrontend.conf in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/ami_cleanup.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/slurm_config.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/attachVolume.py in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/munge_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/cfncluster-ebsnvme-id in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/recipes/torque_install.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/compute_ready in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/CfnCluster-License-README.txt in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/ec2-volid.rules in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/ec2_dev_2_volid.py in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/blacklist-nouveau.conf in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/setup-ephemeral-drives.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/sge_inst.conf in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/slurm-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/munge-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/slurmctld.service in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/supervisord-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/ganglia-webfrontend.conf in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/supervisord.conf in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/torque.setup in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/slurmd.service in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/attributes/default.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/ubuntu-14.04/ec2blkdev-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/ubuntu-14.04/slurm-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/fetch_and_run in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/configure-pat.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/ubuntu-16.04/supervisord-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/slurm.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/99-cfncluster-user-tty.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/cfncluster_supervisord.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/ec2blkdev-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/ubuntu-16.04/ec2blkdev-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/gmond.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/jq-1.4 in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/slurm.csh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/publish_pending.sge.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/nodewatcher.cfg.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/default/torque.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/publish_pending.torque.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/munge.key.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/slurm.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/amazon/cfncluster_supervisord.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/gmetad.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/torque.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/publish_pending.pbspro.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/torque.config.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/files/ubuntu-14.04/supervisord-init in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/sqswatcher.cfg.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/torque.setup.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/ubuntu/gmond.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_update_centos_base.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/lsb.hosts.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/ubuntu/cfncluster_supervisord.conf.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_variables.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/publish_pending.slurm.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/torque.server_name.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/LICENSE.txt in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/templates/default/cfnconfig.erb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/README.md in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/build_env_setup.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/centos-upgrade-second-stage.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/Gemfile in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/centos6.elrepo.repo in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_centos7.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/chefignore in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/metadata.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_ubuntu1604.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/centos-upgrade-first-stage.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_ubuntu1404.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/build_ami.sh in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/.rubocop.yml in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_centos6.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/NOTICE.txt in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/.kitchen.yml in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/packer_alinux.json in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/Rakefile in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/resources/build_essential.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/resources/xcode_command_line_tools.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/.kitchen.cloud.yml in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/recipes/default.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/README.md in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/CONTRIBUTING.md in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/cfncluster/CHANGELOG.md in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/metadata.json in the cache.
[2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/CHANGELOG.md in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/recipes/default.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/libraries/default.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/attributes/default.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/files/halite_gem/poise_python/cheftie.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/files/halite_gem/poise_python/error.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/MAINTAINERS.md in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/build-essential/.foodcritic in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/files/halite_gem/poise_python/python_command_mixin.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/files/halite_gem/poise_python/python_providers/dummy.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/files/halite_gem/poise_python/python_providers/msi.rb in the cache. [2018-01-09T23:08:57+00:00] INFO: Storing updated cookbooks/poise-python/files/halite_gem/poise_python/python_providers/portable_pypy.rb in the cache.
template[/etc/cfncluster/cfnconfig] action create[2018-01-09T23:08:57+00:00] INFO: Processing template[/etc/cfncluster/cfnconfig] action create (cfncluster::_prep_env line 29) [2018-01-09T23:08:57+00:00] INFO: template[/etc/cfncluster/cfnconfig] created file /etc/cfncluster/cfnconfig
create new file /etc/cfncluster/cfnconfig[2018-01-09T23:08:57+00:00] INFO: template[/etc/cfncluster/cfnconfig] updated file contents /etc/cfncluster/cfnconfig
update content in file /etc/cfncluster/cfnconfig from none to 315634 --- /etc/cfncluster/cfnconfig 2018-01-09 23:08:57.904582332 +0000 +++ /etc/cfncluster/.chef-cfnconfig20180109-2046-x1f55g 2018-01-09 23:08:57.904582332 +0000 @@ -1 +1,17 @@ +stack_name=cfncluster-mycluster2 +cfn_preinstall=NONE +cfn_preinstall_args=NONE +cfn_postinstall=NONE +cfn_postinstall_args="NONE" +cfn_region=eu-west-1 +cfn_scheduler=sge +cfn_scheduler_slots=vcpus +cfn_instance_slots=1 +cfn_encrypted_ephemeral=false +cfn_ephemeral_dir=/scratch +cfn_shared_dir=/home/ebs +cfn_proxy=NONE +cfn_node_type=MasterServer +cfn_cluster_user=ec2-user +cfn_volume=vol-0e2d129c481b9fe77[2018-01-09T23:08:57+00:00] INFO: template[/etc/cfncluster/cfnconfig] mode changed to 644
change mode from '' to '0644'
link[/opt/cfncluster/cfnconfig] action create[2018-01-09T23:08:57+00:00] INFO: Processing link[/opt/cfncluster/cfnconfig] action create (cfncluster::_prep_env line 34) [2018-01-09T23:08:57+00:00] INFO: link[/opt/cfncluster/cfnconfig] created
cookbook_file[fetch_and_run] action create[2018-01-09T23:08:57+00:00] INFO: Processing cookbook_file[fetch_and_run] action create (cfncluster::_prep_env line 38) [2018-01-09T23:08:57+00:00] INFO: cookbook_file[fetch_and_run] created file /opt/cfncluster/scripts/fetch_and_run
create new file /opt/cfncluster/scripts/fetch_and_run[2018-01-09T23:08:57+00:00] INFO: cookbook_file[fetch_and_run] updated file contents /opt/cfncluster/scripts/fetch_and_run
update content in file /opt/cfncluster/scripts/fetch_and_run from none to c2c6b0 --- /opt/cfncluster/scripts/fetch_and_run 2018-01-09 23:08:57.916581772 +0000 +++ /opt/cfncluster/scripts/.chef-fetch_and_run20180109-2046-jnkq8f 2018-01-09 23:08:57.916581772 +0000 @@ -1 +1,64 @@ +#!/bin/bash
+. /etc/cfncluster/cfnconfig
+# Error exit function +function error_exit () {
script=basename $0
echo "cfncluster: $script - $1"
logger -t cfncluster "$script - $1"
exit 1 +}
+function download_run (){
url=$1
scheme=$(echo "${url}"| cut -d: -f1)
tmpfile=$(mktemp)
trap "/bin/rm $tmpfile" RETURN
if [ "${scheme}" == "s3" ]; then
aws --region ${cfn_region} s3 cp ${url} - > $tmpfile || return 1
else
wget -qO- ${url} > $tmpfile || return 1
fi
chmod +x $tmpfile || return 1
$tmpfile $@ || return 1 +}
+function run_preinstall () {
if [ "${cfn_preinstall}" != "NONE" ]; then
if [ "${cfn_preinstall_args}" != "NONE" ]; then
download_run ${cfn_preinstall} ${cfn_preinstall_args}
else
download_run ${cfn_preinstall}
fi
fi || error_exit "Failed to run boot_as_master preinstall" +}
+function run_postinstall () {
RC=0
if [ "${cfn_postinstall}" != "NONE" ]; then
if [ "${cfn_postinstall_args}" != "NONE" ]; then
download_run ${cfn_postinstall} ${cfn_postinstall_args}
else
download_run ${cfn_postinstall}
fi
fi || error_exit "Failed to run boot_as_master postinstall" +}
+ACTION=${1#?}
+case $ACTION in
preinstall)
run_preinstall
;;
postinstall)
run_postinstall
;;
*)
echo "Unknown action. Exit gracefully"
exit 0
+esac[2018-01-09T23:08:57+00:00] INFO: cookbook_file[fetch_and_run] owner changed to 0 [2018-01-09T23:08:57+00:00] INFO: cookbook_file[fetch_and_run] group changed to 0 [2018-01-09T23:08:57+00:00] INFO: cookbook_file[fetch_and_run] mode changed to 755
change mode from '' to '0755'
change owner from '' to 'root'
change group from '' to 'root'
cookbook_file[compute_ready] action create[2018-01-09T23:08:57+00:00] INFO: Processing cookbook_file[compute_ready] action create (cfncluster::_prep_env line 45) [2018-01-09T23:08:57+00:00] INFO: cookbook_file[compute_ready] created file /opt/cfncluster/scripts/compute_ready
create new file /opt/cfncluster/scripts/compute_ready[2018-01-09T23:08:57+00:00] INFO: cookbook_file[compute_ready] updated file contents /opt/cfncluster/scripts/compute_ready
update content in file /opt/cfncluster/scripts/compute_ready from none to 3273c9 --- /opt/cfncluster/scripts/compute_ready 2018-01-09 23:08:57.920581586 +0000 +++ /opt/cfncluster/scripts/.chef-compute_ready20180109-2046-fdzrkt 2018-01-09 23:08:57.920581586 +0000 @@ -1 +1,9 @@ +#!/bin/bash
+. /etc/cfncluster/cfnconfig
+# Notify compute is ready +instance_id_url="http://169.254.169.254/latest/meta-data/instance-id" +instance_id=$(curl --retry 3 --retry-delay 0 --silent --fail ${instance_id_url}) +aws --region ${cfn_region} sqs send-message --queue-url ${cfn_sqs_queue} --message-body '{"Type" : "Notification", "Message" : "{\"StatusCode\":\"Complete\",\"Description\":\"Succesfully launched '${instance_id}'\",\"Event\":\"cfncluster:COMPUTE_READY\",\"EC2InstanceId\":\"'${instance_id}'\",\"Slots\":\"'${cfn_instance_slots}'\"}"}'[2018-01-09T23:08:57+00:00] INFO: cookbook_file[compute_ready] owner changed to 0 [2018-01-09T23:08:57+00:00] INFO: cookbook_file[compute_ready] group changed to 0 [2018-01-09T23:08:57+00:00] INFO: cookbook_file[compute_ready] mode changed to 755
change mode from '' to '0755'
change owner from '' to 'root'
change group from '' to 'root' [2018-01-09T23:08:57+00:00] WARN: Skipping final node save because override_runlist was given [2018-01-09T23:08:57+00:00] INFO: Chef Run complete in 2.138475739 seconds [2018-01-09T23:08:57+00:00] INFO: Skipping removal of unused files from the cache
Running handlers: [2018-01-09T23:08:57+00:00] INFO: Running report handlers Running handlers complete [2018-01-09T23:08:57+00:00] INFO: Report handlers complete Chef Client finished, 4/7 resources updated in 04 seconds
2018-01-09 23:08:58,024 [DEBUG] No services specified 2018-01-09 23:08:58,025 [INFO] Running config shellRunPreInstall 2018-01-09 23:08:58,026 [DEBUG] No packages specified 2018-01-09 23:08:58,026 [DEBUG] No groups specified 2018-01-09 23:08:58,026 [DEBUG] No users specified 2018-01-09 23:08:58,026 [DEBUG] No sources specified 2018-01-09 23:08:58,026 [DEBUG] No files specified 2018-01-09 23:08:58,026 [DEBUG] Running command runpreinstall 2018-01-09 23:08:58,026 [DEBUG] No test for command runpreinstall 2018-01-09 23:08:58,047 [INFO] Command runpreinstall succeeded 2018-01-09 23:08:58,047 [DEBUG] Command runpreinstall output: 2018-01-09 23:08:58,047 [DEBUG] No services specified 2018-01-09 23:08:58,048 [INFO] Running config chefConfig 2018-01-09 23:08:58,049 [DEBUG] No packages specified 2018-01-09 23:08:58,049 [DEBUG] No groups specified 2018-01-09 23:08:58,049 [DEBUG] No users specified 2018-01-09 23:08:58,049 [DEBUG] No sources specified 2018-01-09 23:08:58,049 [DEBUG] No files specified 2018-01-09 23:08:58,049 [DEBUG] Running command chef 2018-01-09 23:08:58,049 [DEBUG] No test for command chef 2018-01-09 23:09:37,678 [INFO] Command chef succeeded 2018-01-09 23:09:37,679 [DEBUG] Command chef output: [2018-01-09T23:08:59+00:00] INFO: Forking chef instance to converge... Starting Chef Client, version 12.19.36 [2018-01-09T23:08:59+00:00] INFO: Chef 12.19.36 [2018-01-09T23:08:59+00:00] INFO: Platform: x86_64-linux [2018-01-09T23:08:59+00:00] INFO: Chef-client pid: 2372 [2018-01-09T23:09:00+00:00] INFO: Setting the run_list to recipe[cfncluster::sge_config] from CLI options [2018-01-09T23:09:00+00:00] INFO: Run List is [recipe[cfncluster::sge_config]] [2018-01-09T23:09:00+00:00] INFO: Run List expands to [cfncluster::sge_config] [2018-01-09T23:09:00+00:00] INFO: Starting Chef Run for ip-10-0-0-68.eu-west-1.compute.internal [2018-01-09T23:09:00+00:00] INFO: Running start handlers [2018-01-09T23:09:00+00:00] INFO: Start handlers complete. [2018-01-09T23:09:00+00:00] INFO: HTTP Request Returned 404 Not Found: Object not found: resolving cookbooks for run list: ["cfncluster::sge_config"] [2018-01-09T23:09:01+00:00] INFO: Loading cookbooks [cfncluster@1.4.0, build-essential@8.0.4, poise-python@1.6.0, tar@2.0.0, selinux@2.0.3, nfs@2.4.1, sysctl@0.10.2, yum@5.0.1, yum-epel@2.1.2, openssh@2.4.1, apt@6.1.4, hostname@0.4.2, line@0.6.3, seven_zip@2.0.2, mingw@2.0.1, poise@2.8.1, poise-languages@2.1.1, ohai@5.2.0, compat_resource@12.19.0, iptables@4.3.1, hostsfile@3.0.1, windows@3.4.3, poise-archive@1.5.0] Synchronizing Cookbooks:
build_essential[install_packages] action install[2018-01-09T23:09:03+00:00] INFO: Processing build_essential[install_packages] action install (build-essential::default line 22)
python_runtime[2] action install[2018-01-09T23:09:04+00:00] INFO: Processing python_runtime[2] action install (cfncluster::_setup_python line 34)
poise_languages_system[python2.7] action install[2018-01-09T23:09:04+00:00] INFO: Processing poise_languages_system[python2.7] action install (/etc/chef/local-mode-cache/cache/cookbooks/poise-languages/files/halite_gem/poise_languages/system/mixin.rb line 32)
(up to date)
template[/etc/default/nfs-common] action create[2018-01-09T23:09:13+00:00] INFO: Processing template[/etc/default/nfs-common] action create (nfs::_common line 36) [2018-01-09T23:09:13+00:00] INFO: template[/etc/default/nfs-common] backed up to /etc/chef/local-mode-cache/backup/etc/default/nfs-common.chef-20180109230913.009845 [2018-01-09T23:09:13+00:00] INFO: template[/etc/default/nfs-common] updated file contents /etc/default/nfs-common
service[portmap] action restart[2018-01-09T23:09:13+00:00] INFO: Processing service[portmap] action restart (nfs::_common line 46) [2018-01-09T23:09:13+00:00] INFO: service[portmap] restarted
service[lock] action restart[2018-01-09T23:09:13+00:00] INFO: Processing service[lock] action restart (nfs::_common line 46) [2018-01-09T23:09:13+00:00] INFO: service[lock] restarted
service[nfs-config.service] action restart[2018-01-09T23:09:13+00:00] INFO: Processing service[nfs-config.service] action restart (nfs::_common line 46) [2018-01-09T23:09:13+00:00] INFO: service[nfs-config.service] restarted
template[/etc/modprobe.d/lockd.conf] action create[2018-01-09T23:09:13+00:00] INFO: Processing template[/etc/modprobe.d/lockd.conf] action create (nfs::_common line 36) [2018-01-09T23:09:13+00:00] INFO: template[/etc/modprobe.d/lockd.conf] backed up to /etc/chef/local-mode-cache/backup/etc/modprobe.d/lockd.conf.chef-20180109230913.369153 [2018-01-09T23:09:13+00:00] INFO: template[/etc/modprobe.d/lockd.conf] updated file contents /etc/modprobe.d/lockd.conf
options lockd nlm_udpport=32768 nlm_tcpport=32768 [2018-01-09T23:09:13+00:00] INFO: template[/etc/modprobe.d/lockd.conf] sending restart action to service[portmap] (immediate)
service[portmap] action restart[2018-01-09T23:09:13+00:00] INFO: Processing service[portmap] action restart (nfs::_common line 46) [2018-01-09T23:09:13+00:00] INFO: service[portmap] restarted
service[lock] action restart[2018-01-09T23:09:13+00:00] INFO: Processing service[lock] action restart (nfs::_common line 46) [2018-01-09T23:09:13+00:00] INFO: service[lock] restarted
service[nfs-config.service] action restart[2018-01-09T23:09:13+00:00] INFO: Processing service[nfs-config.service] action restart (nfs::_common line 46) [2018-01-09T23:09:13+00:00] INFO: service[nfs-config.service] restarted
template[/etc/default/nfs-kernel-server] action create[2018-01-09T23:09:14+00:00] INFO: Processing template[/etc/default/nfs-kernel-server] action create (nfs::server line 30) [2018-01-09T23:09:14+00:00] INFO: template[/etc/default/nfs-kernel-server] backed up to /etc/chef/local-mode-cache/backup/etc/default/nfs-kernel-server.chef-20180109230914.253053 [2018-01-09T23:09:14+00:00] INFO: template[/etc/default/nfs-kernel-server] updated file contents /etc/default/nfs-kernel-server
RPCMOUNTDOPTS="-p 32767" RPCNFSDCOUNT="8"
service[nfs-kernel-server] action start[2018-01-09T23:09:14+00:00] INFO: Processing service[nfs-kernel-server] action start (nfs::server line 59) [2018-01-09T23:09:14+00:00] INFO: service[nfs-kernel-server] started
template[/etc/idmapd.conf] action create[2018-01-09T23:09:14+00:00] INFO: Processing template[/etc/idmapd.conf] action create (nfs::_idmap line 23) [2018-01-09T23:09:14+00:00] INFO: template[/etc/idmapd.conf] backed up to /etc/chef/local-mode-cache/backup/etc/idmapd.conf.chef-20180109230914.441045 [2018-01-09T23:09:14+00:00] INFO: template[/etc/idmapd.conf] updated file contents /etc/idmapd.conf
update content in file /etc/idmapd.conf from ca812b to 6f78af --- /etc/idmapd.conf 2017-11-28 18:49:33.064692436 +0000 +++ /etc/.chef-idmapd20180109-2372-xpykyp.conf 2018-01-09 23:09:14.435960774 +0000 @@ -5,7 +5,7 @@
-Domain = ec2.internal +Domain = eu-west-1.compute.internal
[2018-01-09T23:09:14+00:00] INFO: template[/etc/idmapd.conf] sending restart action to service[idmap] (immediate)
service[idmap] action restart[2018-01-09T23:09:14+00:00] INFO: Processing service[idmap] action restart (nfs::_idmap line 29) [2018-01-09T23:09:14+00:00] INFO: service[idmap] restarted
service[ec2blkdev] action start[2018-01-09T23:09:15+00:00] INFO: Processing service[ec2blkdev] action start (cfncluster::_ec2_udev_rules line 51) [2018-01-09T23:09:15+00:00] INFO: service[ec2blkdev] started
hostsfile_entry[localhost] action append[2018-01-09T23:09:17+00:00] INFO: Processing hostsfile_entry[localhost] action append (hostname::default line 115)
Recipe:
file[/etc/hosts] action create[2018-01-09T23:09:17+00:00] INFO: Processing file[/etc/hosts] action create (dynamically defined) [2018-01-09T23:09:17+00:00] INFO: file[/etc/hosts] backed up to /etc/chef/local-mode-cache/backup/etc/hosts.chef-20180109230917.953856 [2018-01-09T23:09:17+00:00] INFO: file[/etc/hosts] updated file contents /etc/hosts
-# The following lines are desirable for IPv6 capable hosts -::1 ip6-localhost ip6-loopback -fe00::0 ip6-localnet -ff00::0 ip6-mcastprefix -ff02::1 ip6-allnodes -ff02::2 ip6-allrouters -ff02::3 ip6-allhosts +127.0.0.1 localhost +::1 ip6-localhost ip6-loopback +ff02::3 ip6-allhosts +ff02::1 ip6-allnodes +ff02::2 ip6-allrouters +fe00:: ip6-localnet +ff00:: ip6-mcastprefix
hostsfile_entry[set hostname] action create[2018-01-09T23:09:17+00:00] INFO: Processing hostsfile_entry[set hostname] action create (hostname::default line 121)
Recipe:
file[/etc/hosts] action create[2018-01-09T23:09:17+00:00] INFO: Processing file[/etc/hosts] action create (dynamically defined) [2018-01-09T23:09:17+00:00] INFO: file[/etc/hosts] backed up to /etc/chef/local-mode-cache/backup/etc/hosts.chef-20180109230917.960799 [2018-01-09T23:09:17+00:00] INFO: file[/etc/hosts] updated file contents /etc/hosts
127.0.0.1 localhost ::1 ip6-localhost ip6-loopback +10.0.0.68 ip-10-0-0-68.eu-west-1.compute.internal ip-10-0-0-68 ff02::3 ip6-allhosts ff02::1 ip6-allnodes ff02::2 ip6-allrouters
ohai[reload_hostname] action reload[2018-01-09T23:09:17+00:00] INFO: Processing ohai[reload_hostname] action reload (hostname::default line 131) [2018-01-09T23:09:18+00:00] INFO: ohai[reload_hostname] reloaded
execute[setup ephemeral] action run[2018-01-09T23:09:18+00:00] INFO: Processing execute[setup ephemeral] action run (cfncluster::base_config line 24)
[execute] + . /etc/cfncluster/cfnconfig ++ stack_name=cfncluster-mycluster2 ++ cfn_preinstall=NONE ++ cfn_preinstall_args=NONE ++ cfn_postinstall=NONE ++ cfn_postinstall_args=NONE ++ cfn_region=eu-west-1 ++ cfn_scheduler=sge ++ cfn_scheduler_slots=vcpus ++ cfn_instance_slots=1 ++ cfn_encrypted_ephemeral=false ++ cfn_ephemeral_dir=/scratch ++ cfn_shared_dir=/home/ebs ++ cfn_proxy=NONE ++ cfn_node_type=MasterServer ++ cfn_cluster_user=ec2-user ++ cfn_volume=vol-0e2d129c481b9fe77
execute[run_configure-pat] action run[2018-01-09T23:09:18+00:00] INFO: Processing execute[run_configure-pat] action run (cfncluster::_master_base_config line 17)
[execute] + echo 'Determining the MAC address on eth0' Determining the MAC address on eth0 ++ grep -Po 'link/ether \K[\w:]+' ++ ip addr show eth0
execute[add_configure-pat] action run[2018-01-09T23:09:19+00:00] INFO: Processing execute[add_configure-pat] action run (cfncluster::_master_base_config line 23) [2018-01-09T23:09:19+00:00] INFO: Processing execute[Guard resource] action run (dynamically defined) [2018-01-09T23:09:19+00:00] INFO: execute[add_configure-pat] ran successfully
execute[attach_volume] action run[2018-01-09T23:09:19+00:00] INFO: Processing execute[attach_volume] action run (cfncluster::_master_base_config line 31) [2018-01-09T23:09:19+00:00] INFO: execute[attach_volume] ran successfully
ruby_block[sleeping_for_volume] action run[2018-01-09T23:09:19+00:00] INFO: Processing ruby_block[sleeping_for_volume] action run (cfncluster::_master_base_config line 37) [2018-01-09T23:09:24+00:00] INFO: ruby_block[sleeping_for_volume] called
ruby_block[setup_disk] action run[2018-01-09T23:09:24+00:00] INFO: Processing ruby_block[setup_disk] action run (cfncluster::_master_base_config line 45) [2018-01-09T23:09:26+00:00] INFO: ruby_block[setup_disk] called
ruby_block[sleeping_for_volume] action run[2018-01-09T23:09:26+00:00] INFO: Processing ruby_block[sleeping_for_volume] action run (cfncluster::_master_base_config line 37) [2018-01-09T23:09:26+00:00] INFO: ruby_block[sleeping_for_volume] called
ruby_block[setup_disk] action run[2018-01-09T23:09:26+00:00] INFO: Processing ruby_block[setup_disk] action run (cfncluster::_master_base_config line 45) [2018-01-09T23:09:26+00:00] INFO: ruby_block[setup_disk] called
ruby_block[setup_disk] action run[2018-01-09T23:09:26+00:00] INFO: Processing ruby_block[setup_disk] action run (cfncluster::_master_base_config line 45) [2018-01-09T23:09:26+00:00] INFO: ruby_block[setup_disk] called
directory[/home/ebs] action create[2018-01-09T23:09:26+00:00] INFO: Processing directory[/home/ebs] action create (cfncluster::_master_base_config line 54) [2018-01-09T23:09:26+00:00] INFO: directory[/home/ebs] created directory /home/ebs
create new directory /home/ebs[2018-01-09T23:09:26+00:00] INFO: directory[/home/ebs] owner changed to 0 [2018-01-09T23:09:26+00:00] INFO: directory[/home/ebs] group changed to 0 [2018-01-09T23:09:26+00:00] INFO: directory[/home/ebs] mode changed to 1777
change mode from '' to '01777'
change owner from '' to 'root'
change group from '' to 'root'
mount[/home/ebs] action mount[2018-01-09T23:09:26+00:00] INFO: Processing mount[/home/ebs] action mount (cfncluster::_master_base_config line 63) [2018-01-09T23:09:26+00:00] INFO: mount[/home/ebs] mounted
mount[/home/ebs] action enable[2018-01-09T23:09:26+00:00] INFO: Processing mount[/home/ebs] action enable (cfncluster::_master_base_config line 63) [2018-01-09T23:09:26+00:00] INFO: mount[/home/ebs] enabled
directory[/home/ebs] action create[2018-01-09T23:09:26+00:00] INFO: Processing directory[/home/ebs] action create (cfncluster::_master_base_config line 72) [2018-01-09T23:09:26+00:00] INFO: directory[/home/ebs] mode changed to 1777
nfs_export[/home/ebs] action create[2018-01-09T23:09:26+00:00] INFO: Processing nfs_export[/home/ebs] action create (cfncluster::_master_base_config line 82)
[2018-01-09T23:09:26+00:00] INFO: append_if_no_line[export /home/ebs] sending run action to execute[exportfs] (immediate)
execute[exportfs] action run[2018-01-09T23:09:26+00:00] INFO: Processing execute[exportfs] action run (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 43)
[execute] exportfs: /etc/exports [1]: Neither 'subtree_check' or 'no_subtree_check' specified for export "10.0.0.0/16:/home/ebs". Assuming default behaviour ('no_subtree_check'). NOTE: this default has changed since nfs-utils version 1.0.x
[2018-01-09T23:09:26+00:00] INFO: execute[exportfs] ran successfully
execute exportfs -ar
nfs_export[/home] action create[2018-01-09T23:09:26+00:00] INFO: Processing nfs_export[/home] action create (cfncluster::_master_base_config line 89)
execute[exportfs] action nothing[2018-01-09T23:09:26+00:00] INFO: Processing execute[exportfs] action nothing (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 43) (skipped due to action :nothing)
append_if_no_line[export /home] action edit[2018-01-09T23:09:26+00:00] INFO: Processing append_if_no_line[export /home] action edit (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 61)
[2018-01-09T23:09:26+00:00] INFO: append_if_no_line[export /home] sending run action to execute[exportfs] (immediate)
execute[exportfs] action run[2018-01-09T23:09:26+00:00] INFO: Processing execute[exportfs] action run (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 43)
[execute] exportfs: /etc/exports [1]: Neither 'subtree_check' or 'no_subtree_check' specified for export "10.0.0.0/16:/home/ebs". Assuming default behaviour ('no_subtree_check'). NOTE: this default has changed since nfs-utils version 1.0.x
exportfs: /etc/exports [2]: Neither 'subtree_check' or 'no_subtree_check' specified for export "10.0.0.0/16:/home".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x
[2018-01-09T23:09:26+00:00] INFO: execute[exportfs] ran successfully
execute exportfs -ar
template[/etc/ganglia/gmetad.conf] action create[2018-01-09T23:09:26+00:00] INFO: Processing template[/etc/ganglia/gmetad.conf] action create (cfncluster::_master_base_config line 96) [2018-01-09T23:09:26+00:00] INFO: template[/etc/ganglia/gmetad.conf] backed up to /etc/chef/local-mode-cache/backup/etc/ganglia/gmetad.conf.chef-20180109230926.408830 [2018-01-09T23:09:26+00:00] INFO: template[/etc/ganglia/gmetad.conf] updated file contents /etc/ganglia/gmetad.conf
update content in file /etc/ganglia/gmetad.conf from b8f766 to 3b9d9b --- /etc/ganglia/gmetad.conf 2016-02-10 16:11:16.000000000 +0000 +++ /etc/ganglia/.chef-gmetad20180109-2372-8gwi6.conf 2018-01-09 23:09:26.399642553 +0000 @@ -41,7 +41,7 @@
-data_source "my cluster" localhost +data_source "cfncluster-mycluster2" localhost
#
@@ -69,7 +69,7 @@
-# gridname "MyGrid" +#gridname "" #
@@ -150,39 +150,18 @@
# -# The port and protocol on which Graphite is listening +# The port on which Graphite is listening
# -# default: tcp -# carbon_protocol udp -# -# Deprecated in favor of graphite_path A prefix to prepend to the -# metric names exported by gmetad. Graphite uses dot- +# A prefix to prepend to the metric names exported by gmetad. Graphite uses dot-
- -#------------------------------------------------------------------------------- -# Memcached configuration (if it has been compiled in) -# Format documentation at http://docs.libmemcached.org/libmemcached_configuration.html -# default: "" -# memcached_parameters "--SERVER=127.0.0.1" #
template[/etc/ganglia/gmond.conf] action create[2018-01-09T23:09:26+00:00] INFO: Processing template[/etc/ganglia/gmond.conf] action create (cfncluster::_master_base_config line 103) [2018-01-09T23:09:26+00:00] INFO: template[/etc/ganglia/gmond.conf] backed up to /etc/chef/local-mode-cache/backup/etc/ganglia/gmond.conf.chef-20180109230926.422080 [2018-01-09T23:09:26+00:00] INFO: template[/etc/ganglia/gmond.conf] updated file contents /etc/ganglia/gmond.conf
update content in file /etc/ganglia/gmond.conf from 556740 to 141f61 --- /etc/ganglia/gmond.conf 2016-02-10 16:11:16.000000000 +0000 +++ /etc/ganglia/.chef-gmond20180109-2372-18eag0w.conf 2018-01-09 23:09:26.411642275 +0000 @@ -1,338 +1,361 @@ -/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */ -globals {
daemonize = yes
setuid = yes
user = ganglia
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
host_dmax = 0 /secs /
cleanup_threshold = 300 /secs /
gexec = no
send_metadata_interval = 0
-}
+/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */ +globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 3600 /secs. Expires (removes from web interface) hosts in 1 hour /
host_tmax = 20 /secs /
cleanup_threshold = 300 /secs /
gexec = no
send_metadata_interval = 0 /secs / +}
-/* If a cluster attribute is specified, then all gmond hosts are wrapped inside
-/ The host section describes attributes of the host, like the location / -host {
-/* Feel free to specify as many udp_send_channels as you like. Gmond
-/ You can specify as many udp_recv_channels as you like as well. / -udp_recv_channel {
-/* You can specify as many tcp_accept_channels as you like to share
-/* Each metrics module that is referenced by gmond must be specified and
-include ('/etc/ganglia/conf.d/.conf') +/ Optional sFlow settings */ +#sflow { +# udp_port = 6343 +# accept_vm_metrics = yes +# accept_jvm_metrics = yes +# multiple_jvm_instances = no +# accept_http_metrics = yes +# multiple_http_instances = no +# accept_memcache_metrics = yes +# multiple_memcache_instances = no +#}
+/* Each metrics module that is referenced by gmond must be specified and
-/* The old internal 2.5.x metric array has been replaced by the following
-/* This collection group will cause a heartbeat (or beacon) to be sent every
-/* This collection group will send general info about this host every 1200 secs.
-/ This collection group will send the status of gexecd for this host every 300 secs / -/ Unlike 2.5.x the default behavior is to report gexecd OFF. / -collection_group {
-/* This collection group will collect the CPU status info every 20 secs.
-collection_group {
-/ This group collects the number of running and total processes / -collection_group {
-/* This collection group grabs the volatile memory metrics every 40 secs and
-collection_group {
-/ Different than 2.5.x default since the old config made no sense / -collection_group {
-collection_group {
+include ("/etc/ganglia/conf.d/*.conf")
service[gmetad] action enable[2018-01-09T23:09:26+00:00] INFO: Processing service[gmetad] action enable (cfncluster::_master_base_config line 110) (up to date)
service[gmetad] action restart[2018-01-09T23:09:26+00:00] INFO: Processing service[gmetad] action restart (cfncluster::_master_base_config line 110) [2018-01-09T23:09:26+00:00] INFO: service[gmetad] restarted
restart service service[gmetad]
service[ganglia-monitor] action enable[2018-01-09T23:09:26+00:00] INFO: Processing service[ganglia-monitor] action enable (cfncluster::_master_base_config line 115) (up to date)
service[ganglia-monitor] action restart[2018-01-09T23:09:26+00:00] INFO: Processing service[ganglia-monitor] action restart (cfncluster::_master_base_config line 115) [2018-01-09T23:09:27+00:00] INFO: service[ganglia-monitor] restarted
restart service service[ganglia-monitor]
service[apache2] action enable[2018-01-09T23:09:27+00:00] INFO: Processing service[apache2] action enable (cfncluster::_master_base_config line 120) (up to date)
service[apache2] action start[2018-01-09T23:09:27+00:00] INFO: Processing service[apache2] action start (cfncluster::_master_base_config line 120) (up to date)
linux_user[ec2-user] action create[2018-01-09T23:09:27+00:00] INFO: Processing linux_user[ec2-user] action create (cfncluster::_master_base_config line 126) [2018-01-09T23:09:27+00:00] INFO: linux_user[ec2-user] created
create user ec2-user
bash[ssh-keygen] action run[2018-01-09T23:09:27+00:00] INFO: Processing bash[ssh-keygen] action run (cfncluster::_master_base_config line 134) [2018-01-09T23:09:28+00:00] INFO: bash[ssh-keygen] ran successfully
execute "bash" "/tmp/chef-script20180109-2372-9vkl3q"
bash[copy_and_perms] action run[2018-01-09T23:09:28+00:00] INFO: Processing bash[copy_and_perms] action run (cfncluster::_master_base_config line 142) [2018-01-09T23:09:28+00:00] INFO: bash[copy_and_perms] ran successfully
execute "bash" "/tmp/chef-script20180109-2372-qg17az"
bash[ssh-keyscan] action run[2018-01-09T23:09:28+00:00] INFO: Processing bash[ssh-keyscan] action run (cfncluster::_master_base_config line 150)
[execute] # ip-10-0-0-68:22 SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.2
# ip-10-0-0-68:22 SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.2
[2018-01-09T23:09:28+00:00] INFO: bash[ssh-keyscan] ran successfully
execute "bash" "/tmp/chef-script20180109-2372-9uit41"
template[/etc/sqswatcher.cfg] action create[2018-01-09T23:09:28+00:00] INFO: Processing template[/etc/sqswatcher.cfg] action create (cfncluster::_master_base_config line 159) [2018-01-09T23:09:28+00:00] INFO: template[/etc/sqswatcher.cfg] created file /etc/sqswatcher.cfg
create new file /etc/sqswatcher.cfg[2018-01-09T23:09:28+00:00] INFO: template[/etc/sqswatcher.cfg] updated file contents /etc/sqswatcher.cfg
update content in file /etc/sqswatcher.cfg from none to 7e0a9b --- /etc/sqswatcher.cfg 2018-01-09 23:09:28.323598787 +0000 +++ /etc/.chef-sqswatcher20180109-2372-11tj2ok.cfg 2018-01-09 23:09:28.323598787 +0000 @@ -1 +1,7 @@ +[sqswatcher] +region = eu-west-1 +sqsqueue = cfncluster-mycluster2-SQS-1SIOFPCOZ42WZ +table_name = cfncluster-mycluster2-DynamoDBTable-1PEXHHYV91DEF +scheduler = sge +cluster_user = ec2-user[2018-01-09T23:09:28+00:00] INFO: template[/etc/sqswatcher.cfg] owner changed to 0 [2018-01-09T23:09:28+00:00] INFO: template[/etc/sqswatcher.cfg] group changed to 0 [2018-01-09T23:09:28+00:00] INFO: template[/etc/sqswatcher.cfg] mode changed to 644
change mode from '' to '0644'
change owner from '' to 'root'
change group from '' to 'root' Recipe: cfncluster::base_config
template[/etc/sudoers.d/99-cfncluster-user-tty] action create[2018-01-09T23:09:28+00:00] INFO: Processing template[/etc/sudoers.d/99-cfncluster-user-tty] action create (cfncluster::base_config line 40) [2018-01-09T23:09:28+00:00] INFO: template[/etc/sudoers.d/99-cfncluster-user-tty] created file /etc/sudoers.d/99-cfncluster-user-tty
create new file /etc/sudoers.d/99-cfncluster-user-tty[2018-01-09T23:09:28+00:00] INFO: template[/etc/sudoers.d/99-cfncluster-user-tty] updated file contents /etc/sudoers.d/99-cfncluster-user-tty
update content in file /etc/sudoers.d/99-cfncluster-user-tty from none to 584e08 --- /etc/sudoers.d/99-cfncluster-user-tty 2018-01-09 23:09:28.343598344 +0000 +++ /etc/sudoers.d/.chef-99-cfncluster-user-tty20180109-2372-hf0zg2 2018-01-09 23:09:28.343598344 +0000 @@ -1 +1,2 @@ +Defaults:ec2-user !requiretty[2018-01-09T23:09:28+00:00] INFO: template[/etc/sudoers.d/99-cfncluster-user-tty] owner changed to 0 [2018-01-09T23:09:28+00:00] INFO: template[/etc/sudoers.d/99-cfncluster-user-tty] group changed to 0 [2018-01-09T23:09:28+00:00] INFO: template[/etc/sudoers.d/99-cfncluster-user-tty] mode changed to 600
change mode from '' to '0600'
change owner from '' to 'root'
change group from '' to 'root'
template[/etc/cfncluster/cfncluster_supervisord.conf] action create[2018-01-09T23:09:28+00:00] INFO: Processing template[/etc/cfncluster/cfncluster_supervisord.conf] action create (cfncluster::base_config line 48) [2018-01-09T23:09:28+00:00] INFO: template[/etc/cfncluster/cfncluster_supervisord.conf] created file /etc/cfncluster/cfncluster_supervisord.conf
create new file /etc/cfncluster/cfncluster_supervisord.conf[2018-01-09T23:09:28+00:00] INFO: template[/etc/cfncluster/cfncluster_supervisord.conf] updated file contents /etc/cfncluster/cfncluster_supervisord.conf
update content in file /etc/cfncluster/cfncluster_supervisord.conf from none to 7d6ae9 --- /etc/cfncluster/cfncluster_supervisord.conf 2018-01-09 23:09:28.359597988 +0000 +++ /etc/cfncluster/.chef-cfncluster_supervisord20180109-2372-ti8gdu.conf 2018-01-09 23:09:28.359597988 +0000 @@ -1 +1,7 @@ +# Generated by Chef for cfncluster MasterServer# Local modifications could be be overwritten. +[program:sqswatcher] +command = /usr/local/bin/sqswatcher +redirect_stderr = true +stdout_logfile = /var/log/sqswatcher +[2018-01-09T23:09:28+00:00] INFO: template[/etc/cfncluster/cfncluster_supervisord.conf] owner changed to 0 [2018-01-09T23:09:28+00:00] INFO: template[/etc/cfncluster/cfncluster_supervisord.conf] group changed to 0 [2018-01-09T23:09:28+00:00] INFO: template[/etc/cfncluster/cfncluster_supervisord.conf] mode changed to 644
change mode from '' to '0644'
change owner from '' to 'root'
change group from '' to 'root'
service[supervisord] action enable[2018-01-09T23:09:28+00:00] INFO: Processing service[supervisord] action enable (cfncluster::base_config line 56) [2018-01-09T23:09:28+00:00] INFO: service[supervisord] enabled
enable service service[supervisord]
service[supervisord] action start[2018-01-09T23:09:28+00:00] INFO: Processing service[supervisord] action start (cfncluster::base_config line 56) [2018-01-09T23:09:30+00:00] INFO: service[supervisord] started
start service service[supervisord] Recipe: cfncluster::sge_install
remote_file[/opt/cfncluster/sources/sge-8.1.9.tar.gz] action create[2018-01-09T23:09:30+00:00] INFO: Processing remote_file[/opt/cfncluster/sources/sge-8.1.9.tar.gz] action create (cfncluster::sge_install line 21) (skipped due to not_if)
bash[make install] action run[2018-01-09T23:09:30+00:00] INFO: Processing bash[make install] action run (cfncluster::sge_install line 29) (up to date)
replace_or_add[AddQueue] action edit[2018-01-09T23:09:30+00:00] INFO: Processing replace_or_add[AddQueue] action edit (cfncluster::sge_install line 52) (up to date)
linux_user[sgeadmin] action create[2018-01-09T23:09:30+00:00] INFO: Processing linux_user[sgeadmin] action create (cfncluster::sge_install line 69) (up to date)
directory[/opt/cfncluster/licenses/sge] action create[2018-01-09T23:09:30+00:00] INFO: Processing directory[/opt/cfncluster/licenses/sge] action create (cfncluster::sge_install line 78) (up to date)
bash[copy license stuff] action run[2018-01-09T23:09:30+00:00] INFO: Processing bash[copy license stuff] action run (cfncluster::sge_install line 80) (up to date) Recipe: cfncluster::_master_sge_config
nfs_export[/opt/sge] action create[2018-01-09T23:09:30+00:00] INFO: Processing nfs_export[/opt/sge] action create (cfncluster::_master_sge_config line 17)
execute[exportfs] action nothing[2018-01-09T23:09:30+00:00] INFO: Processing execute[exportfs] action nothing (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 43) (skipped due to action :nothing)
append_if_no_line[export /opt/sge] action edit[2018-01-09T23:09:30+00:00] INFO: Processing append_if_no_line[export /opt/sge] action edit (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 61)
[2018-01-09T23:09:30+00:00] INFO: append_if_no_line[export /opt/sge] sending run action to execute[exportfs] (immediate)
execute[exportfs] action run[2018-01-09T23:09:30+00:00] INFO: Processing execute[exportfs] action run (/etc/chef/local-mode-cache/cache/cookbooks/nfs/providers/export.rb line 43)
[execute] exportfs: /etc/exports [1]: Neither 'subtree_check' or 'no_subtree_check' specified for export "10.0.0.0/16:/home/ebs". Assuming default behaviour ('no_subtree_check'). NOTE: this default has changed since nfs-utils version 1.0.x
exportfs: /etc/exports [2]: Neither 'subtree_check' or 'no_subtree_check' specified for export "10.0.0.0/16:/home".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x
exportfs: /etc/exports [3]: Neither 'subtree_check' or 'no_subtree_check' specified for export "10.0.0.0/16:/opt/sge".
Assuming default behaviour ('no_subtree_check').
NOTE: this default has changed since nfs-utils version 1.0.x
[2018-01-09T23:09:30+00:00] INFO: execute[exportfs] ran successfully
execute exportfs -ar
cookbook_file[sge_inst.conf] action create[2018-01-09T23:09:30+00:00] INFO: Processing cookbook_file[sge_inst.conf] action create (cfncluster::_master_sge_config line 24) [2018-01-09T23:09:30+00:00] INFO: cookbook_file[sge_inst.conf] created file /opt/sge/sge_inst.conf
create new file /opt/sge/sge_inst.conf[2018-01-09T23:09:30+00:00] INFO: cookbook_file[sge_inst.conf] updated file contents /opt/sge/sge_inst.conf
update content in file /opt/sge/sge_inst.conf from none to 864fe3 --- /opt/sge/sge_inst.conf 2018-01-09 23:09:30.959541576 +0000 +++ /opt/sge/.chef-sge_inst20180109-2372-1gtqrzt.conf 2018-01-09 23:09:30.959541576 +0000 @@ -1 +1,262 @@ +#------------------------------------------------- -- sh -- +# SGE default configuration file +#-------------------------------------------------
+# Use always fully qualified pathnames, please
+# This file is sourced by a Bourne shell script, so the assignments
+# must be in sh syntax, i.e.
+#
+# SGE_ROOT Path, this is basic information +#(mandatory for qmaster and execd installation) +SGE_ROOT="/opt/sge"
+# SGE_QMASTER_PORT is used by qmaster for communication +# Please enter the port in this way: 1300 +# not like this: 1300/tcp +# (mandatory for qmaster installation) +SGE_QMASTER_PORT=6444
+# SGE_EXECD_PORT is used by execd for communication +# Please enter the port in this way: 1300 +# Not like this: 1300/tcp +# (mandatory for qmaster installation) +SGE_EXECD_PORT=6445
+# SGE_ENABLE_SMF +# if set to false SMF will not control SGE services +SGE_ENABLE_SMF="false"
+# SGE_CLUSTER_NAME +# Name of this cluster (used by SMF as a service instance name) +SGE_CLUSTER_NAME=p6444
+# SGE_JMX_PORT is used by qmaster's JMX MBean server
+# mandatory if install_qmaster -jmx -auto
+# SGE_JMX_SSL is used by qmaster's JMX MBean server +# if SGE_JMX_SSL=true, the mbean server connection uses +# SSL authentication +SGE_JMX_SSL="false"
+# SGE_JMX_SSL_CLIENT is used by qmaster's JMX MBean server +# if SGE_JMX_SSL_CLIENT=true, the mbean server connection uses +# SSL authentication of the client in addition +SGE_JMX_SSL_CLIENT="false"
+# SGE_JMX_SSL_KEYSTORE is used by qmaster's JMX MBean server
+# if SGE_JMX_SSL=true the server keystore found here is used
+# e.g. /var/lib/sgeCA/port
+# SGE_JMX_SSL_KEYSTORE_PW is used by qmaster's JMX MBean server +# password for the SGE_JMX_SSL_KEYSTORE file +SGE_JMX_SSL_KEYSTORE_PW="Please enter the server keystore password"
+# SGE_JVM_LIB_PATH is used by qmaster's jvm thread +# path to libjvm.so +# If value is missing or set to "none", the JMX thread will not be +# installed. When the value is empty, or the path does not exist on +# the system, Grid Engine will try to find a correct value. If it +# cannot do so, the value is set to "jvmlib_missing" and the JMX +# thread will be configured but will fail to start +SGE_JVM_LIB_PATH="Please enter absolute path of libjvm.so"
+# SGE_ADDITIONAL_JVM_ARGS is used by qmaster's jvm thread +# jvm specific arguments as -verbose:jni etc. +# optional, can be empty +SGE_ADDITIONAL_JVM_ARGS="-Xmx256m"
+# CELL_NAME, will be a directory in SGE_ROOT, contains the common dir +# Please enter only the name of the cell. No path please +# (mandatory for qmaster and execd installation) +CELL_NAME="default"
+# ADMIN_USER, if you want to use a different admin user than the owner +# of SGE_ROOT, you have to enter the user name here +# Leaving this blank, the owner of the SGE_ROOT dir will be used as admin user +ADMIN_USER=sgeadmin
+# The directory where qmaster spools the parts which are not spooled by DB +# (mandatory for qmaster installation) +QMASTER_SPOOL_DIR=$SGE_ROOT/$CELL_NAME/spool/qmaster
+# The directory where the execd spools (active jobs)
+# This entry is needed even if you are going to use
+# berkeley db spooling. Only cluster configuration and jobs will
+# be spooled in the database. The execution daemon still needs a spool
+# directory
+# (mandatory for qmaster installation)
+EXECD_SPOOL_DIR=$SGE_ROOT/$CELL_NAME/spool
+# For monitoring and accounting of jobs, every job will get +# unique GID. So you have to enter a free GID Range, which +# is assigned to each job running on a machine. +# If you want to run 100 Jobs at the same time on one host you +# have to enter a GID-Range like that: 16000-16100 +# (mandatory for qmaster installation) +GID_RANGE="20000-21000"
+# If SGE is compiled with -spool-dynamic, you have to enter here which +# spooling method should be used. (classic or berkeleydb) +# (mandatory for qmaster installation) +SPOOLING_METHOD="classic"
+# The directory where the DB spools +# If berkeley db spooling is used, it must contain the path to +# the spooling db. Please enter the full path. (eg. /tmp/data/spooldb) +# Remember, this directory must normally be local on the qmaster host. +# An NFS4 mount is supposed to be safe, and NFS2/3, or other remote +# filesystems can be used if the "private" bootstrap option is given. +DB_SPOOLING_DIR="spooldb"
+# This parameter sets the number of parallel installation processes. +# To prevents a system overload, or exceeding the number of open file +# descriptors, the user can limit the number of parallel install processes. +# e.g. set PAR_EXECD_INST_COUNT="20", maximum 20 execds are installed in +# parallel. +PAR_EXECD_INST_COUNT="20"
+# A list of hosts which should become admin hosts +# If you do not enter any host here, you have to add all of your hosts +# by hand after the installation. The example works without any entry +ADMIN_HOST_LIST=""
+# A list of hosts which should become submit hosts +# If you do not enter any host here, you have to add all of your hosts +# by hand after the installation. The example works without any entry +SUBMIT_HOST_LIST=""
+# A list of hosts which should become exec hosts
+# If you do not enter any host here, you have to add all of your hosts
+# by hand after the installation. The example works without any entry
+# (mandatory for execution host installation)
+EXEC_HOST_LIST=hostname
+# The directory where the execd spools (local configuration) +# If you want configure your execution daemons to spool in +# a local directory, you have to enter that directory here. +# If you do not want to configure a local execution host spool directory +# please leave this empty +EXECD_SPOOL_DIR_LOCAL="/var/spool/sge"
+# If true, the domainnames will be ignored during the hostname resolving +# if false, the fully qualified domain name will be used for name resolving +HOSTNAME_RESOLVING="true"
+# Shell which should be used for remote installation (rsh/ssh) +# This is only supported if your hosts and rshd/sshd are configured +# not to ask for a password, or prompting with any message. +SHELL_NAME="ssh"
+# This remote copy command is used for CSP installation. +# The script needs the remote copy command for distributing +# the CSP certificates. Using SSH the command scp has to be entered, +# using the not so secure rsh, the command rcp has to be entered. +# Both need a passwordless ssh/rsh connection to the hosts which +# should be connected to. (Mandatory for CSP installation mode) +COPY_COMMAND="scp"
+# Enter your default domain, if you are using /etc/hosts or NIS configuration +DEFAULT_DOMAIN="none"
+# If a job stops, fails, or finishes, you can send mail to this address +ADMIN_MAIL="root"
+# If true, the rc scripts (sgemaster, sgeexecd) will be added +# to start automatically during boottime +ADD_TO_RC="true"
+# If this is "true" the file permissions of executables will be set to 755 +# and of ordinary files to 644. +SET_FILE_PERMS="true"
+# This option is not implemented, yet. +# When a exechost should be uninstalled, the running jobs will be rescheduled +RESCHEDULE_JOBS="wait"
+# Enter one of the three distributed scheduler tuning configuration sets +# (1=normal, 2=high, 3=max) +SCHEDD_CONF="1"
+# The name of the shadow host. This host must have read/write permission +# to the qmaster spool directory +# If you want to setup a shadow host, you must enter the servername +# (mandatory for shadow host installation) +SHADOW_HOST=""
+# Remove these execution hosts in automatic mode +# (mandatory for uninstallation of execution hosts) +EXEC_HOST_LIST_RM=""
+# This option is used for startup script removing. +# If true, all rc startup scripts will be removed during +# automatic deinstallation. If false, the scripts won't +# be touched. +# (mandatory for uninstallation of execution/qmaster hosts) +REMOVE_RC="true"
+# This is a Windows specific part of the auto isntallation template +# If you going to install windows executions hosts, you have to enable the +# windows support. To do this, please set the WINDOWS_SUPPORT variable +# to "true". ("false" is disabled) +# (Mandatory for qmaster installation. By default WINDOWS_SUPPORT is +# disabled) +WINDOWS_SUPPORT="false"
+# Enabling the WINDOWS_SUPPORT recommends the following parameter. +# The WIN_ADMIN_NAME will be added to the list of SGE managers. +# Without adding the WIN_ADMIN_NAME, the execution host installation +# won't work correctly. +# WIN_ADMIN_NAME is set to "Administrator", which is default on most +# Windows systems. In some cases the WIN_ADMIN_NAME can be prefixed with +# the windows domain name (eg. DOMAIN+Administrator) +# (Mandatory for qmaster installation if windows hosts should be installed) +WIN_ADMIN_NAME="Administrator"
+# This parameter is used to switch between local ADMINUSER and Windows +# Domain Adminuser. Setting the WIN_DOMAIN_ACCESS variable to true, the +# Adminuser will be a Windows Domain User. It is recommended that +# a Windows Domain Server is configured and the Windows Domain User is +# created. Setting this variable to false, the local Adminuser will be +# used as ADMINUSER. The install script tries to create this user account +# but we recommend, because it will be safer, to create this user +# before running the installation. +# (Mandatory for qmaster installation if windows hosts should be installed) +WIN_DOMAIN_ACCESS="false"
+# This section is used for CSP installation mode. +# CSP_RECREATE recreates the certs on each installation, if true. +# If false, the certs will be created if they don't exist. +# Existing certs won't be overwritten. (Mandatory for CSP install) +CSP_RECREATE="true"
+# The created certs won't be copied if this option is set to false +# If true, the script tries to copy the generated certs. This +# requires passwordless ssh/rsh access for user root to the +# execution hosts +CSP_COPY_CERTS="false"
+# csp information, your country code (only 2 characters) +# (mandatory for csp install) +CSP_COUNTRY_CODE="DE"
+# your state (mandatory for csp install) +CSP_STATE="Germany"
+# your location, e.g. the building (mandatory for csp install) +CSP_LOCATION="Building"
+# your arganisation (mandatory for csp install) +CSP_ORGA="Organisation"
+# your organisation unit (mandatory for csp install) +CSP_ORGA_UNIT="Organisation_unit"
+# your email (mandatory for csp install) +CSP_MAIL_ADDRESS="name@yourdomain.com"[2018-01-09T23:09:30+00:00] INFO: cookbook_file[sge_inst.conf] owner changed to 0 [2018-01-09T23:09:30+00:00] INFO: cookbook_file[sge_inst.conf] group changed to 0 [2018-01-09T23:09:30+00:00] INFO: cookbook_file[sge_inst.conf] mode changed to 644
change mode from '' to '0644'
change owner from '' to 'root'
change group from '' to 'root'
execute[inst_sge] action run[2018-01-09T23:09:30+00:00] INFO: Processing execute[inst_sge] action run (cfncluster::_master_sge_config line 32)
[execute] Reading configuration from file ./sge_inst.conf Install log can be found in: /opt/sge/default/common/install_logs/qmaster_install_ip-10-0-0-68_2018-01-09_23:09:31.log [2018-01-09T23:09:36+00:00] INFO: execute[inst_sge] ran successfully
execute ./inst_sge -noremote -m -auto ./sge_inst.conf
link[/etc/profile.d/sge.sh] action create[2018-01-09T23:09:36+00:00] INFO: Processing link[/etc/profile.d/sge.sh] action create (cfncluster::_master_sge_config line 38) [2018-01-09T23:09:36+00:00] INFO: link[/etc/profile.d/sge.sh] created
create symlink at /etc/profile.d/sge.sh to /opt/sge/default/common/settings.sh
link[/etc/profile.d/sge.csh] action create[2018-01-09T23:09:36+00:00] INFO: Processing link[/etc/profile.d/sge.csh] action create (cfncluster::_master_sge_config line 42) [2018-01-09T23:09:36+00:00] INFO: link[/etc/profile.d/sge.csh] created
create symlink at /etc/profile.d/sge.csh to /opt/sge/default/common/settings.csh
service[sgemaster.p6444] action enable[2018-01-09T23:09:36+00:00] INFO: Processing service[sgemaster.p6444] action enable (cfncluster::_master_sge_config line 46) (up to date)
service[sgemaster.p6444] action start[2018-01-09T23:09:36+00:00] INFO: Processing service[sgemaster.p6444] action start (cfncluster::_master_sge_config line 46) [2018-01-09T23:09:36+00:00] INFO: service[sgemaster.p6444] started
start service service[sgemaster.p6444]
bash[add_host_as_master] action run[2018-01-09T23:09:36+00:00] INFO: Processing bash[add_host_as_master] action run (cfncluster::_master_sge_config line 51)
[execute] ip-10-0-0-68.eu-west-1.compute.internal added to submit host list [2018-01-09T23:09:36+00:00] INFO: bash[add_host_as_master] ran successfully
execute "bash" "/tmp/chef-script20180109-2372-17l6qzu"
template[/opt/cfncluster/scripts/publish_pending] action create[2018-01-09T23:09:36+00:00] INFO: Processing template[/opt/cfncluster/scripts/publish_pending] action create (cfncluster::_master_sge_config line 58) [2018-01-09T23:09:36+00:00] INFO: template[/opt/cfncluster/scripts/publish_pending] created file /opt/cfncluster/scripts/publish_pending
create new file /opt/cfncluster/scripts/publish_pending[2018-01-09T23:09:36+00:00] INFO: template[/opt/cfncluster/scripts/publish_pending] updated file contents /opt/cfncluster/scripts/publish_pending
update content in file /opt/cfncluster/scripts/publish_pending from none to 1139f6 --- /opt/cfncluster/scripts/publish_pending 2018-01-09 23:09:36.907423159 +0000 +++ /opt/cfncluster/scripts/.chef-publish_pending20180109-2372-nepbg5 2018-01-09 23:09:36.907423159 +0000 @@ -1 +1,33 @@ +#!/bin/bash
+# Copyright 2013-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except in compliance with the +# License. A copy of the License is located at +# +# http://aws.amazon.com/apache2.0/ +# +# or in the "LICENSE.txt" file accompanying this file. This file is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES +# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and +# limitations under the License.
+PATH=/bin:/usr/bin:/usr/local/bin +export PATH
+. /etc/cfncluster/cfnconfig
+if [ "$cfn_proxy" != "NONE" ]; then
export http_proxy=$cfn_proxy; export https_proxy=$cfn_proxy
export HTTP_PROXY=$cfn_proxy; export HTTPS_PROXY=$cfn_proxy
export no_proxy=169.254.169.254; export NO_PROXY=169.254.169.254 +fi
+. /opt/sge/default/common/settings.sh +pending=$(qstat -g d -s p -u '*' | tail -n+3 | awk '$5 == "qw" {total = total+ $8} END {print total}')
+if [ "${pending}x" == "x" ]; then +pending=0 +fi
+aws --region ${cfn_region} cloudwatch put-metric-data --namespace cfncluster --metric-name pending --unit Count --value ${pending} --dimensions Stack=${stack_name}[2018-01-09T23:09:36+00:00] INFO: template[/opt/cfncluster/scripts/publish_pending] owner changed to 0 [2018-01-09T23:09:36+00:00] INFO: template[/opt/cfncluster/scripts/publish_pending] group changed to 0 [2018-01-09T23:09:36+00:00] INFO: template[/opt/cfncluster/scripts/publish_pending] mode changed to 744
change mode from '' to '0744'
change owner from '' to 'root'
change group from '' to 'root'
cron[publish_pending] action create[2018-01-09T23:09:36+00:00] INFO: Processing cron[publish_pending] action create (cfncluster::_master_sge_config line 65) [2018-01-09T23:09:37+00:00] INFO: cron[publish_pending] added crontab entry
add crontab entry for cron[publish_pending] [2018-01-09T23:09:37+00:00] INFO: template[/etc/default/nfs-kernel-server] sending restart action to service[nfs-kernel-server] (delayed) Recipe: nfs::server
service[nfs-kernel-server] action restart[2018-01-09T23:09:37+00:00] INFO: Processing service[nfs-kernel-server] action restart (nfs::server line 59) [2018-01-09T23:09:37+00:00] INFO: service[nfs-kernel-server] restarted
restart service service[nfs-kernel-server] [2018-01-09T23:09:37+00:00] INFO: Chef Run complete in 36.896199959 seconds
Running handlers: [2018-01-09T23:09:37+00:00] INFO: Running report handlers Running handlers complete [2018-01-09T23:09:37+00:00] INFO: Report handlers complete
Deprecated features used!
Cloning resource attributes for directory[/home/ebs] from prior resource
Previous directory[/home/ebs]: /etc/chef/local-mode-cache/cache/cookbooks/cfncluster/recipes/_master_base_config.rb:54:in from_file' Current directory[/home/ebs]: /etc/chef/local-mode-cache/cache/cookbooks/cfncluster/recipes/_master_base_config.rb:72:in
from_file' at 1 location:
Chef Client finished, 62/190 resources updated in 38 seconds
2018-01-09 23:09:37,681 [DEBUG] No services specified 2018-01-09 23:09:37,682 [INFO] Running config shellRunPostInstall 2018-01-09 23:09:37,682 [DEBUG] No packages specified 2018-01-09 23:09:37,682 [DEBUG] No groups specified 2018-01-09 23:09:37,682 [DEBUG] No users specified 2018-01-09 23:09:37,682 [DEBUG] No sources specified 2018-01-09 23:09:37,683 [DEBUG] No files specified 2018-01-09 23:09:37,683 [DEBUG] Running command runpostinstall 2018-01-09 23:09:37,683 [DEBUG] No test for command runpostinstall 2018-01-09 23:09:37,688 [INFO] Command runpostinstall succeeded 2018-01-09 23:09:37,688 [DEBUG] Command runpostinstall output: 2018-01-09 23:09:37,688 [DEBUG] No services specified 2018-01-09 23:09:37,689 [INFO] Running config shellForkClusterReadyInstall 2018-01-09 23:09:37,690 [DEBUG] No packages specified 2018-01-09 23:09:37,690 [DEBUG] No groups specified 2018-01-09 23:09:37,690 [DEBUG] No users specified 2018-01-09 23:09:37,690 [DEBUG] No sources specified 2018-01-09 23:09:37,690 [DEBUG] No files specified 2018-01-09 23:09:37,690 [DEBUG] Running command clusterreadyinstall 2018-01-09 23:09:37,690 [DEBUG] No test for command clusterreadyinstall 2018-01-09 23:09:37,695 [INFO] Command clusterreadyinstall succeeded 2018-01-09 23:09:37,695 [DEBUG] Command clusterreadyinstall output: Unknown action. Exit gracefully
2018-01-09 23:09:37,696 [DEBUG] No services specified 2018-01-09 23:09:37,696 [INFO] ConfigSets completed 2018-01-09 23:09:37,696 [DEBUG] Not clearing reboot trigger as scheduling support is not available 2018-01-09 23:09:37,696 [INFO] -----------------------Build complete----------------------- 2018-01-09 23:09:37,872 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.eu-west-1.amazonaws.com 2018-01-09 23:09:37,872 [DEBUG] Signaling resource MasterServer in stack cfncluster-mycluster2 with unique ID i-0383682101f9a0ed3 and status SUCCESS`
and every time I set the shard_dir=/home
of the config file, I can't ssh to the master node from my computer, because I received this failure
totoro@TOTORO:~$ ssh -i ~/aws/key-pair/cfncluster-keypair1.pem ubuntu@34.243.79.255
The authenticity of host '34.243.79.255 (34.243.79.255)' can't be established.
ECDSA key fingerprint is SHA256:ZRZJSLAX39zWddllC9mqW+gN5sDXfQD66eWlTCGswKM.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '34.243.79.255' (ECDSA) to the list of known hosts.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
the config file was like this:
[cluster testcluster1]
# Name of an existing EC2 KeyPair to enable SSH access to the instances.
key_name = cfncluster-keypair1
# Override path to cloudformation in S3
# (defaults to https://s3.amazonaws.com/cfncluster-<aws_region_name>/templates/cfncluster-<version>.cfn.json)
#template_url = https://s3.amazonaws.com/cfncluster-us-east-1/templates/cfncluster.cfn.json
# Cluster Server EC2 instance type
# (defaults to t2.micro for default template)
#compute_instance_type = t2.micro
# Master Server EC2 instance type
# (defaults to t2.micro for default template
#master_instance_type = t2.micro
# Inital number of EC2 instances to launch as compute nodes in the cluster.
# (defaults to 2 for default template)
initial_queue_size = 1
# Maximum number of EC2 instances that can be launched in the cluster.
# (defaults to 10 for the default template)
max_queue_size = 2
# Boolean flag to set autoscaling group to maintain initial size and scale back
# (defaults to false for the default template)
#maintain_initial_size = false
# Cluster scheduler
# (defaults to sge for the default template)
#scheduler = sge
#scheduler = sge
# Type of cluster to launch i.e. ondemand or spot
# (defaults to ondemand for the default template)
#cluster_type = ondemand
# Spot price for the ComputeFleet
#spot_price = 0.00
# ID of a Custom AMI, to use instead of published AMI's
# must find the available AMI
# AMI Name: cfncluster-1.3.0-ubuntu-1604-lts-hvm-201608251414
#custom_ami = ami-406e1f33
#custom_ami = ami-ff8d1886
#custom_ami = ami-96b025ef
#custom_ami = ami-62fa6e1b
# cfncluster fds-image, no NFS
custom_ami = ami-898b1ff0
# cfncluster default ubuntu1604 image in eu-west-1
#custom_ami = ami-9802b1e1
# Specify S3 resource which cfncluster nodes will be granted read-only access
# (defaults to NONE for the default template)
#s3_read_resource = arn:aws:s3:::cfncluster1-s3
# Specify S3 resource which cfncluster nodes will be granted read-write access
# (defaults to NONE for the default template)
#s3_read_write_resource = arn:aws:s3:::cfncluster1-s3
# URL to a preinstall script. This is executed before any of the boot_as_* scripts are run
# (defaults to NONE for the default template)
#pre_install = NONE
# Arguments to be passed to preinstall script
# (defaults to NONE for the default template)
#pre_install_args = NONE
# URL to a postinstall script. This is executed after any of the boot_as_* scripts are run
# (defaults to NONE for the default template)
#post_install = NONE
# Arguments to be passed to postinstall script
# (defaults to NONE for the default template)
#post_install_args = NONE
# HTTP(S) proxy server, typically http://x.x.x.x:8080
# (defaults to NONE for the default template)
#proxy_server = NONE
# Cluster placement group. This placement group must already exist.
# (defaults to NONE for the default template)
#placement_group = NONE
# Cluster placment logic. This enables the whole cluster or only compute to use the placement group
# (defaults to cluster in the default template)
#placement = cluster
# Path/mountpoint for ephemeral drives
# (defaults to /scratch in the default template)
#ephemeral_dir = /scratch
# Path/mountpoint for shared EBS volume
# (defaults to /shared in the default template)
#### if i set this to /home, then all nodes' home directories from computer fleet
#### will be shared through NFS system. Not that AWS EFS but the original NFS system
#shared_dir = /home/ubuntu/ebs
shared_dir = /home
# Encrypted ephemeral drives. In-memory keys, non-recoverable.
# (defaults to false in default template)
#encrypted_ephemeral = false
# MasterServer root volume size in GB. (AMI must support growroot)
# (defaults to 10 in default template)
#master_root_volume_size = 10
# ComputeFleet root volume size in GB. (AMI must support growroot)
# (defaults to 10 in default template)
#compute_root_volume_size = 10
# OS type used in the cluster
# (defaults to alinux in the default template)
#base_os = Ubuntu
# CloudWatch Logs region
# (defaults to NONE in the default template)
#cwl_region = NONE
# CloudWatch Logs Log Group name
# (defaults to NONE in the default template)
#cwl_log_group = NONE
# Existing EC2 IAM role to be assosiated with the EC2 instances
# (defaults to NONE in the default template)
#ec2_iam_role = NONE
# Extra Json to be merged with the dna.json used by Chef
# (defaults to {} in the default template)
#extra_json = {}
# Additional CloudFormation template to launch with the cluster
#additional_cfn_template = NONE
# Settings section relating to VPC to be used
#vpc_settings = cfncluster-vpc-test1
#vpc_settings = mycluster1-vpc
#test vpc_settings
vpc_settings = mycluster2-vpc
# Settings section relating to EBS volume
#ebs_settings = fds-test-volume-2
# Settings section relation to scaling
#scaling_settings = custom
I have no idea what was wrong..
Hope you could help me out.
best regards, Michael
Michael - Sorry I did not respond to this sooner. This is related to the issue in https://github.com/awslabs/cfncluster/issues/322. By mounting the shared directory on /home via the shared_dir
config, you are effectively making the contents of the master node's default /home that is in the Master Server's primary EBS volume inaccessible. As a result, the keypair you intended to use to SSH into the master would no longer be available for the SSH authentication. I am going to close this issue, given it is related to https://github.com/awslabs/cfncluster/issues/322, and to avoid having two separate threads to discuss this.
Hello,
I started to learn how to use cfncluster to set up an HPC cluster in AWS. After I configured the config file of cfncluster I used it to build the cluster. It was successfully been set up. But after I connect to the master node, I found few problems with my cluster.
Could someone please tell me how to solve this two problems? Thank you very much
Michael