everpeace / vagrant-mesos

Spin up your Mesos Cluster with Vagrant! (VirtualBox and AWS)
https://github.com/everpeace/vagrant-mesos
MIT License
432 stars 138 forks source link

no security group, and vagrant-mesos can't set `hostname` to its public dns name. #67

Open johnykov opened 9 years ago

johnykov commented 9 years ago

Hi, I am very happy user of this stack (I've tested it locally on vagrant) and wish to establish it on aws to show it to some of my colleagues.

I've created a dedicated user for access_key and secret_access_key and dedicated group with "AmazonEC2FullAccess" policy but even though I get an error saying, "there is no such security group", I think this is because of some my wrong configuration of AWS, I would appreciate any helping hint ;)

==> default: will be silently ignored.
==> default: Launching an instance with the following settings...
==> default:  -- Type: t2.medium
==> default:  -- AMI: ami-51bb4d3a
==> default:  -- Region: us-east-1
==> default:  -- Keypair: mesos-vagrant
==> default:  -- Security Groups: ["sg-81eaf2e5"]
==> default:  -- Block Device Mapping: []
==> default:  -- Terminate On Shutdown: false
==> default:  -- Monitoring: false
==> default:  -- EBS optimized: false
==> default:  -- Assigning a public IP address in a VPC: false
==> default: Warning! Vagrant might not be able to SSH into the instance.
==> default: Please check your security groups settings.
/home/ubuntu/.vagrant.d/gems/gems/excon-0.45.3/lib/excon/middlewares/expects.rb:6:in `response_call': The security group 'sg-81eaf2e5' does not exist (Fog::Compute::AWS::NotFound)
    from /home/ubuntu/.vagrant.d/gems/gems/excon-0.45.3/lib/excon/middlewares/response_parser.rb:8:in `response_call'

cheers

everpeace commented 9 years ago

@hanskoff Hi! Thank you for trying this project :-)

Do you use VPC?? If so, did you confirm that sg-81eaf2e5 corresponds to the VPC id which you try to use??

johnykov commented 9 years ago

Hi, thanks for response ;)

Yeah, definitely I didn't configure VPC. I will read and work on it ;) Thanks for hint.

johnykov commented 9 years ago

Hello again, I've confirmed that sg-81eaf2e5 corresponds to the VPC id I want to use, but problem remains ;(

btw. when I try to run my configuration first message I get is

There are errors in the configuration of this machine. Please fix
the following errors and try again:

AWS Provider:
* vagrant_aws.config.ami_required

but this one I manage to fix with https://github.com/mitchellh/vagrant-aws/issues/275

everpeace commented 9 years ago

@hanskoff oh, I recently updated amis. So could you try to update your vagrant-mesos box image before vagrant up

$ vagrant box update
$ vagrant up --provider=aws

In default configuration, vagrant pulls ami metadata from vagrant cloud and launch vm instances with correct amis.

johnykov commented 9 years ago

I'll do that. Do you have any additional idea about not found security group? 16 cze 2015 9:02 PM "Shingo Omura" notifications@github.com napisał(a):

@hanskoff https://github.com/hanskoff oh, I recently updated amis. So could you try to update your vagrant-mesos box image before vagrant up

$ vagrant box update $ vagrant up --provider=aws

In default configuration, vagrant pulls ami metadata from vagrant cloud and launch vm instances with correct amis.

— Reply to this email directly or view it on GitHub https://github.com/everpeace/vagrant-mesos/issues/67#issuecomment-112529553 .

everpeace commented 9 years ago

@hanskoff hmm, I'm not sure about that. perhaps, name of security group possibly might work instead of id...

johnykov commented 9 years ago

ad 1. box update didn't resolve ami problem ;) ad 2. providing group name helped instead of group id... strange new 3 problem. chef doesn't complete successfully I see

==> default: Ran apt-cache policy update-notifier-common returned 100
==> default: [2015-06-17T06:54:18+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
Chef never successfully completed! Any errors should be visible in the
output above. Please fix your recipes so that they properly complete.
everpeace commented 9 years ago

@hanskoff Yay! you successfully ssh to the vm! Could you kindly paste all vagrant output??

johnykov commented 9 years ago

I've tried vagrant provision this time and got same error message

Generating chef JSON and uploading...
==> default: Running chef-solo...
==> default: stdin: is not a tty
==> default: [2015-06-17T06:54:14+00:00] INFO: Forking chef instance to converge...
==> default: [2015-06-17T06:54:14+00:00] INFO: *** Chef 12.3.0 ***
==> default: [2015-06-17T06:54:14+00:00] INFO: Chef-client pid: 2524
==> default: [2015-06-17T06:54:16+00:00] INFO: Setting the run_list to ["recipe[apt]", "recipe[mesos::master]", "recipe[mesos::slave]"] from CLI options
==> default: [2015-06-17T06:54:16+00:00] INFO: Run List is [recipe[apt], recipe[mesos::master], recipe[mesos::slave]]
==> default: [2015-06-17T06:54:16+00:00] INFO: Run List expands to [apt, mesos::master, mesos::slave]
==> default: [2015-06-17T06:54:16+00:00] INFO: Starting Chef Run for ip-10-157-159-177.ec2.internal
==> default: [2015-06-17T06:54:16+00:00] INFO: Running start handlers
==> default: [2015-06-17T06:54:16+00:00] INFO: Start handlers complete.
==> default: [2015-06-17T06:54:17+00:00] INFO: node[:mesos][:prefix] is ignored. prefix will be set with /usr/local .
==> default: [2015-06-17T06:54:17+00:00] WARN: Cloning resource attributes for template[/etc/init/mesos-master.conf] from prior resource (CHEF-3694)
==> default: [2015-06-17T06:54:17+00:00] WARN: Previous template[/etc/init/mesos-master.conf]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/libraries/helpers.rb:177:in `deploy_service_scripts'
==> default: [2015-06-17T06:54:17+00:00] WARN: Current  template[/etc/init/mesos-master.conf]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/libraries/helpers.rb:200:in `activate_master_service_scripts'
==> default: [2015-06-17T06:54:17+00:00] INFO: node[:mesos][:prefix] is ignored. prefix will be set with /usr/local .
==> default: [2015-06-17T06:54:17+00:00] WARN: Cloning resource attributes for directory[/usr/local/var/mesos/deploy] from prior resource (CHEF-3694)
==> default: [2015-06-17T06:54:17+00:00] WARN: Previous directory[/usr/local/var/mesos/deploy]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/recipes/master.rb:21:in `from_file'
==> default: [2015-06-17T06:54:17+00:00] WARN: Current  directory[/usr/local/var/mesos/deploy]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/recipes/slave.rb:21:in `from_file'
==> default: [2015-06-17T06:54:17+00:00] WARN: Cloning resource attributes for template[/etc/init/mesos-slave.conf] from prior resource (CHEF-3694)
==> default: [2015-06-17T06:54:17+00:00] WARN: Previous template[/etc/init/mesos-slave.conf]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/libraries/helpers.rb:186:in `deploy_service_scripts'
==> default: [2015-06-17T06:54:17+00:00] WARN: Current  template[/etc/init/mesos-slave.conf]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/libraries/helpers.rb:213:in `activate_slave_service_scripts'
==> default: [2015-06-17T06:54:17+00:00] WARN: Cloning resource attributes for template[/etc/mesos/zk] from prior resource (CHEF-3694)
==> default: [2015-06-17T06:54:17+00:00] WARN: Previous template[/etc/mesos/zk]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/recipes/master.rb:87:in `from_file'
==> default: [2015-06-17T06:54:17+00:00] WARN: Current  template[/etc/mesos/zk]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/recipes/slave.rb:59:in `from_file'
==> default: [2015-06-17T06:54:17+00:00] WARN: Cloning resource attributes for template[/etc/default/mesos] from prior resource (CHEF-3694)
==> default: [2015-06-17T06:54:17+00:00] WARN: Previous template[/etc/default/mesos]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/recipes/master.rb:97:in `from_file'
==> default: [2015-06-17T06:54:17+00:00] WARN: Current  template[/etc/default/mesos]: /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/mesos/recipes/slave.rb:69:in `from_file'
==> default:
==> default: ================================================================================
==> default: Error executing action `install` on resource 'apt_package[update-notifier-common]'
==> default: ================================================================================
==> default:
==> default:
==> default: Mixlib::ShellOut::ShellCommandFailed
==> default: ------------------------------------
==> default: Expected process to exit with [0], but received '100'
==> default:
==> default: ---- Begin output of apt-cache policy update-notifier-common ----
==> default:
==> default: STDOUT:
==> default:
==> default: STDERR: E: Encountered a section with no Package: header
==> default:
==> default: E: Problem with MergeList /var/lib/apt/lists/us-east-1.ec2.archive.ubuntu.com_ubuntu_dists_trusty_main_i18n_Translation-en
==> default:
==> default: E: The package lists or status file could not be parsed or opened.
==> default:
==> default: ---- End output of apt-cache policy update-notifier-common ----
==> default:
==> default: Ran apt-cache policy update-notifier-common returned 100
==> default:
==> default:
==> default: Resource Declaration:
==> default: ---------------------
==> default: # In /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/apt/recipes/default.rb
==> default:
==> default:
==> default:
==> default:  75: package 'update-notifier-common' do
==> default:
==> default:  76:   notifies :run, 'execute[apt-get-update]', :immediately
==> default:
==> default:  77:   only_if { apt_installed? }
==> default:
==> default:  78: end
==> default:
==> default:  79:
==> default:
==> default:
==> default:
==> default: Compiled Resource:
==> default: ------------------
==> default: # Declared in /tmp/vagrant-chef/600f65399f86813f58e0a48b48726273/cookbooks/apt/recipes/default.rb:75:in `from_file'
==> default:
==> default:
==> default:
==> default: apt_package("update-notifier-common") do
==> default:
==> default:   action :install
==> default:
==> default:   retries 0
==> default:
==> default:   retry_delay 2
==> default:
==> default:   default_guard_interpreter :default
==> default:
==> default:   package_name "update-notifier-common"
==> default:
==> default:   timeout 900
==> default:
==> default:   declared_type :package
==> default:
==> default:   cookbook_name :apt
==> default:
==> default:   recipe_name "default"
==> default:
==> default:   only_if { #code block }
==> default:
==> default: end
==> default:
==> default:
==> default:
==> default: [2015-06-17T06:54:18+00:00] INFO: Running queued delayed notifications before re-raising exception
==> default: [2015-06-17T06:54:18+00:00] ERROR: Running exception handlers
==> default: [2015-06-17T06:54:18+00:00] ERROR: Exception handlers complete
==> default: [2015-06-17T06:54:18+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
==> default: [2015-06-17T06:54:18+00:00] ERROR: apt_package[update-notifier-common] (apt::default line 75) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '100'
==> default: ---- Begin output of apt-cache policy update-notifier-common ----
==> default: STDOUT:
==> default: STDERR: E: Encountered a section with no Package: header
==> default: E: Problem with MergeList /var/lib/apt/lists/us-east-1.ec2.archive.ubuntu.com_ubuntu_dists_trusty_main_i18n_Translation-en
==> default: E: The package lists or status file could not be parsed or opened.
==> default: ---- End output of apt-cache policy update-notifier-common ----
==> default: Ran apt-cache policy update-notifier-common returned 100
==> default: [2015-06-17T06:54:18+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
Chef never successfully completed! Any errors should be visible in the
output above. Please fix your recipes so that they properly complete.
everpeace commented 9 years ago

@hanskoff hmm... log said that 'apt-get update' failed.

johnykov commented 9 years ago

This one helped to clean and provision again http://askubuntu.com/a/30199 ;)

yeap, I've added aws.ami="" line

only mesos is up and running, marathon and chronos are down:( why is that? when I use provide=virtualbox marathon and chronos are down as well :(

johnykov commented 9 years ago

when I tried to fire marathon from hand it prints

/opt/marathon$ ./bin/start
MESOS_NATIVE_JAVA_LIBRARY is not set. Searching in /usr/lib /usr/local/lib.
MESOS_NATIVE_LIBRARY, MESOS_NATIVE_JAVA_LIBRARY set to '/usr/lib/libmesos.so'
Error: Unable to access jarfile ./bin/../target/scala-2.11/marathon-assembly-0.8.2.jar
johnykov commented 9 years ago

I made it work ;) sudo chmod 777 target/scala-2.11/marathon-assembly-0.8.2.jar all 3 services are accessible by ip:port

everpeace commented 9 years ago

@hanskoff Great to hear that!!

By the way, in my environment, I successfully spin up mesos vm and can access all three web UIs. something seemed to go wrong in your environment, I'm not sure though.

For now, I close this issue. If you have some problem on this issue, please feel free to reopen.

johnykov commented 9 years ago

ok, the last thingie is when I deploy an example play app - mesos can't boot it up... I've noticed that mesos set hostname to ip-10-170-85-220.ec2.internal which is not accessible from outside, isn't this a common issue? how do I change mesos config?

when I ssh to ec2 instance and type hostname it's set to ip-10-170-85-220

johnykov commented 9 years ago

I can't reopen this ticket

everpeace commented 9 years ago

@hanskoff could you tell me a little bit more detail about what your essential problem is?

By the way it is difficult for vagrant to configure its hostname before provisioning the vms.

johnykov commented 9 years ago

for virtualbox configuration I can use marathon to deploy an app and it works. for aws configuration, chef provider establishes mesos & marathon resolving hostname from aws ec2 private dns which is not accessible from outside and because of that - in the end no app can be deployed successfully with marathon... I wish to provide hostname parameter being public dns of aws instance... did you manage successfully to deploy app with marathon on aws config?

everpeace commented 9 years ago

@hanskoff oh, ok. I now see what you meant.

did you manage successfully to deploy app with marathon on aws config?

I always get public dns via this small script

$ vagrant ssh -- 'echo http://`curl --silent http://169.254.169.254/latest/meta-data/public-hostname`:8080'

Deployment itself was successfully finished by calling public dns name provided in above. And accessing deployed apps, I always manually convert internal hostname which is displayed in marathon ui to its public dns....

As I mentioned, it is hard to configure vm's hostname with its public dns from vagrant provisioner... If you found nice way to configure vm's hostname to public dns, please give me feedback and I will be waiting for your PR!!

everpeace commented 9 years ago

Hi @hanskoff , I have worked how to configure hostname with its public dns name. Unfortunately, I have some bad update...

According to the ec2 document "Changing the Hostname of Your Linux Instance", to change hostname, we have to reboot the vm after changing /etc/hostname. However, as you know, default public dns name changes when the vm is booted because it contains public ip.....

Thus, I think we could have two options:

johnykov commented 9 years ago

You are right, even when I do sudo hostname newname and restart network service - it obtains internal domain name from somewhere. I think this is not the way to do it. The real problem is (I found in /var/log/mesos/mesos-slave) this message Checkpointing ACK for status update TASK_FAILED - it doesn't arise on virtualbox.

I've figure out how to setup mesos-[master | slave] and experimenting with ip and hostname settings. I'm trying to set public ip or public dns name... no progress so far.

johnykov commented 9 years ago

strange, I can successfully deploy this one: https://github.com/mesosphere/marathon/blob/master/examples/PythonSimpleHTTPServer.json but not this one: https://github.com/mesosphere/marathon/blob/master/examples/Play.json

johnykov commented 9 years ago

I think I found the guy There is insufficient memory for the Java Runtime Environment to continue. I should switch t1.micro to t2.medium... and changing it, it brought me back to VPC security groups may not be used for a non-VPC launch problem, and after configuring that vpc, I see I need to provide subnet id (which I did) and visualization type where I stop cuz I don't see this param in vagrant aws plugin...

looks like m3.medium works