ConPaaS-team / conpaas

ConPaaS: integrated runtime environment for elastic cloud applications
http://www.conpaas.eu
BSD 3-Clause "New" or "Revised" License
14 stars 3 forks source link

The WordPress manifest doesn't work #71

Closed gpierre42 closed 9 years ago

gpierre42 commented 9 years ago

I tried starting the WordPress manifest using the ConPaaS installation provided by Teodor (with button "deploy ready-made application"). It starts well, but stops after creating the XtreemFS service. The service never gets started, although I believe a start request must have been issued since I got charged two credits in total.

The other services (MySQL/Galera and PHP) never get created.

I can imagine two issues here, but I'll let the specialists make up their own mind:

FrancoCaffarraAndEsterDiBello commented 9 years ago

Good Morning @gpierre42, we have modified the manifest in the commit db53252c5472f0d905f12122e83cb31f08c86642, changing MySQL with Galera, but probably in the RC5 tarballs provided by Teodor there isn't the last update, so in the manifest is specified MySQL but in the img there is not MySQL enabled.

tcrivat commented 9 years ago

Hi @FrancoCaffarraAndEsterDiBello,

So, after you updated the manifest, WordPress was working fine with the new Galera service, right?

I will have a look in a moment to see which manifest is included in the release candidate.

FrancoCaffarraAndEsterDiBello commented 9 years ago

right. We have tested it and it works fine with Galera service in the last manifest

tcrivat commented 9 years ago

The manifest file is the latest version, however it seems that the XtreemFS service doesn't start properly. When trying to start a stand-alone XtreemFS service, the manager starts successfully but starting the agent fails.

The ConPaaS services image in use for this Amazon installation was generated on September 18. I think some commits may have changed it since then, I will try again with a new image, after I finish generating it.

gpierre42 commented 9 years ago

I just tried to create an XtreemFS by hand and to start it. The starting failed (as previously reported by Teodor), with the following messages in the manager log:

2014-09-25 11:22:28,181 DEBUG conpaas.core.manager Using libcloud version 0.12.4 2014-09-25 11:22:28,823 DEBUG ReservationTimer RTIMER Creating timer for ['manager'] 2014-09-25 11:22:29,068 INFO conpaas.core.clouds.base EC2 cloud ready. REGION=ec2.us-west-2.amazonaws.com 2014-09-25 11:22:29,068 INFO conpaas.core.ipop Not starting a VPN: IPOP_IP_ADDRESS not found 2014-09-25 11:22:35,010 INFO conpaas.core.manager Ganglia started successfully 2014-09-25 11:25:12,642 INFO conpaas.core.manager Manager starting up 2014-09-25 11:25:12,643 DEBUG conpaas.core.controller [create_nodes] 2014-09-25 11:25:27,796 DEBUG conpaas.core.controller [create_nodes] iter 1: creating 1 nodes on cloud iaas 2014-09-25 11:25:27,796 DEBUG conpaas.core.controller [create_nodes]: cloud.new_instances(1, conpaas-agent-xtreemfs-u2-s1, None) 2014-09-25 11:25:28,468 DEBUG conpaas.core.controller [create_nodes]: cloud.new_instances returned [ServiceNode(id=iaasi-79fef974, ip=)] 2014-09-25 11:25:28,468 DEBUG conpaas.core.controller [wait_for_nodes]: going to start polling for 1 nodes 2014-09-25 11:25:28,468 DEBUG conpaas.core.controller [check_node]: node.ip = , node.private_ip = : return False 2014-09-25 11:25:28,468 DEBUG conpaas.core.controller [wait_for_nodes]: waiting 10 secs for 1 nodes 2014-09-25 11:25:38,478 DEBUG conpaas.core.controller [wait_for_nodes]: refreshing 1 nodes 2014-09-25 11:25:38,479 DEBUG conpaas.core.clouds.base list_vms(has_private_ip=True) 2014-09-25 11:25:38,806 DEBUG conpaas.core.controller [check_node]: test_agent(54.190.45.155, 5555) 2014-09-25 11:25:59,805 DEBUG conpaas.core.controller [check_node]: [Errno 110] Connection timed out 2014-09-25 11:25:59,805 DEBUG conpaas.core.controller [wait_for_nodes]: waiting 10 secs for 1 nodes 2014-09-25 11:26:09,816 DEBUG conpaas.core.controller [check_node]: test_agent(54.190.45.155, 5555) 2014-09-25 11:26:30,813 DEBUG conpaas.core.controller [check_node]: [Errno 110] Connection timed out 2014-09-25 11:26:30,813 DEBUG conpaas.core.controller [wait_for_nodes]: waiting 10 secs for 1 nodes 2014-09-25 11:26:40,824 DEBUG conpaas.core.controller [check_node]: test_agent(54.190.45.155, 5555) 2014-09-25 11:27:01,821 DEBUG conpaas.core.controller [__check_node]: [Errno 110] Connection timed out 2014-09-25 11:27:01,822 DEBUG conpaas.core.controller [wait_for_nodes]: waiting 10 secs for 1 nodes 2014-09-25 11:27:11,832 DEBUG conpaas.core.controller [check_node]: test_agent(54.190.45.155, 5555) 2014-09-25 11:27:14,836 DEBUG conpaas.core.controller [check_node]: [Errno 113] No route to host 2014-09-25 11:27:14,836 DEBUG conpaas.core.controller [wait_for_nodes]: waiting 10 secs for 1 nodes 2014-09-25 11:27:24,847 DEBUG conpaas.core.controller [check_node]: test_agent(54.190.45.155, 5555) 2014-09-25 11:27:24,848 DEBUG conpaas.core.controller [check_node]: [Errno 111] Connection refused 2014-09-25 11:27:24,848 DEBUG conpaas.core.controller [wait_for_nodes]: waiting 10 secs for 1 nodes 2014-09-25 11:27:34,858 DEBUG conpaas.core.controller [check_node]: test_agent(54.190.45.155, 5555) 2014-09-25 11:27:34,881 DEBUG conpaas.core.controller [__check_node]: node = ServiceNode(id=iaasi-79fef974, ip=54.190.45.155) 2014-09-25 11:27:34,881 DEBUG conpaas.core.controller [wait_for_nodes]: All nodes are ready [ServiceNode(id=iaasi-79fef974, ip=54.190.45.155)] 2014-09-25 11:27:34,882 DEBUG ReservationTimer RTIMER Creating timer for ['iaasi-79fef974'] 2014-09-25 11:27:36,555 DEBUG conpaas.core.manager _create_certs: stderr Owner: OID.2.5.4.72=manager, EMAILADDRESS=info@conpaas.eu, O=Contrail, UID=2, CN=ConPaaS, OID.1.3.6.1.5.5.7.48.1.7=1 Issuer: O=ConPaaS, CN=CA, EMAILADDRESS=info@conpaas.eu Serial number: 3d Valid from: Thu Sep 25 11:20:01 UTC 2014 until: Fri Sep 25 11:20:01 UTC 2015 Certificate fingerprints: MD5: 94:21:7B:84:3C:85:4C:DD:F4:1F:BC:44:44:9C:D6:9C SHA1: 2A:D9:6E:1C:DE:6F:D8:D3:4B:DE:D9:E0:80:CC:0E:58:07:F3:14:67 Signature algorithm name: SHA1withRSA Version: 1 SUCCESS: All certificates created

2014-09-25 11:27:36,555 DEBUG conpaas.core.manager _create_certs: stdout Generating a 1024 bit RSA private key ..............................................................++++++ .........................................++++++

writing new private key to 'dir.key'

Signature ok subject=/C=DE/ST=Berlin/L=Berlin/O=Contrail/CN=host/dir/emailAddress=info@conpaas.eu Getting CA Private Key Generating a 1024 bit RSA private key .......++++++ .++++++

writing new private key to 'mrc.key'

Signature ok subject=/C=DE/ST=Berlin/L=Berlin/O=Contrail/CN=host/mrc/emailAddress=info@conpaas.eu Getting CA Private Key Generating a 1024 bit RSA private key ..............++++++ ..............++++++

writing new private key to 'osd.key'

Signature ok subject=/C=DE/ST=Berlin/L=Berlin/O=Contrail/CN=host/osd/emailAddress=info@conpaas.eu Getting CA Private Key Enter keystore password: Re-enter new password: Trust this certificate? [no]: Certificate was added to keystore

2014-09-25 11:27:36,580 DEBUG conpaas.core.manager _create_client_cert: creating tmp dir 2014-09-25 11:27:36,580 DEBUG conpaas.core.manager _create_client_cert: created tmp dir 2014-09-25 11:27:36,580 DEBUG conpaas.core.manager _create_client_cert: executing script 2014-09-25 11:27:36,639 DEBUG conpaas.core.manager _create_client_cert: stderr SUCCESS: All certificates created

2014-09-25 11:27:36,640 DEBUG conpaas.core.manager _create_client_cert: stdout Generating a 1024 bit RSA private key ...........++++++ ..++++++

writing new private key to 'client.key'

Signature ok subject=/C=DE/ST=Berlin/L=Berlin/O=Contrail/CN=xtreemfs-service/client/emailAddress=info@conpaas.eu Getting CA Private Key

2014-09-25 11:27:36,640 DEBUG conpaas.core.manager _start_dir([ServiceNode(id=iaasi-79fef974, ip=54.190.45.155)]) 2014-09-25 11:27:36,669 DEBUG conpaas.core.manager New uuid for iaasi-79fef974 (dir) -> f364cd68-44a6-11e4-9481-22000afb2fa5 2014-09-25 11:27:37,887 DEBUG conpaas.core.manager New uuid for iaasi-79fef974 (mrc) -> f41eaf08-44a6-11e4-9481-22000afb2fa5 2014-09-25 11:27:39,125 DEBUG conpaas.core.manager New uuid for iaasi-79fef974 (osd) -> f4db8de4-44a6-11e4-9481-22000afb2fa5 2014-09-25 11:27:39,126 INFO conpaas.core.manager Creating a volume named osd-f4db8de4-44a6-11e4-9481-22000afb2fa5 (1024 MBs) 2014-09-25 11:27:39,126 DEBUG conpaas.core.controller create_volume(cloud=iaas, size=1024, name='osd-f4db8de4-44a6-11e4-9481-22000afb2fa5') 2014-09-25 11:27:39,467 DEBUG conpaas.core.controller [delete_nodes]: killing iaasi-79fef974 2014-09-25 11:27:39,467 DEBUG ReservationTimer RTIMER removed node iaasi-79fef974, updated list [] 2014-09-25 11:27:39,467 DEBUG ReservationTimer RTIMER Stopping timer for [] 2014-09-25 11:27:39,468 DEBUG conpaas.core.clouds.base kill_instance(node=ServiceNode(id=iaasi-79fef974, ip=54.190.45.155)) 2014-09-25 11:27:39,748 ERROR conpaas.core.manager do_startup: Failed to request a new node Traceback (most recent call last): File "/root/ConPaaS/src/conpaas/services/xtreemfs/manager/manager.py", line 291, in _do_startup self._start_osd(self.osdNodes, startCloud) File "/root/ConPaaS/src/conpaas/services/xtreemfs/manager/manager.py", line 211, in _start_osd node.id, cloud) File "/root/ConPaaS/src/conpaas/core/manager.py", line 212, in create_volume volume = self.controller.create_volume(size, name, vm_id, cloud) File "/root/ConPaaS/src/conpaas/core/controller.py", line 498, in create_volume return cloud.create_volume(size, name, vm_id) File "/root/ConPaaS/src/conpaas/core/clouds/ec2.py", line 114, in create_volume if node.id == vm_id ][0]

IndexError: list index out of range

tcrivat commented 9 years ago

Hi @noma,

Maybe you can help us troubleshoot this issue? It seems to be specific to XtreemFS on Amazon EC2.

To give you some context: we are trying to run the latest version of ConPaaS on Amazon EC2 and the XtreemFS service does not start correctly on this setup. The XtreemFS manager starts fine, but starting an agent fails with the error shown by @gpierre42 above.

You can try yourself reproduce the issue by creating and starting an XtreemFS service using the installation on following link:

http://conpaas-online.ddns.net/

You may create another user or use the one with credentials: X (removed).

Thanks a lot!

noma commented 9 years ago

@tcrivat Please test with: c5588c5

If this works, the problem was using "node.id" instead of "node.vmid", which might make no difference for non-EC2 cloud back-ends.

@FrancoCaffarraAndEsterDiBello Please try if you can reproduce the error for your service, and depending on the result from above change your code accordingly.

Call Stack: https://github.com/ConPaaS-team/conpaas/blob/dev/conpaas-services/src/conpaas/services/xtreemfs/manager/manager.py#L211 https://github.com/ConPaaS-team/conpaas/blob/dev/conpaas-services/src/conpaas/core/manager.py#L212 https://github.com/ConPaaS-team/conpaas/blob/dev/conpaas-services/src/conpaas/core/controller.py#L498 https://github.com/ConPaaS-team/conpaas/blob/dev/conpaas-services/src/conpaas/core/clouds/ec2.py#L114

tcrivat commented 9 years ago

@noma: Here is what happens now:

2014-09-26 15:44:27,344 DEBUG conpaas.core.manager _start_dir([ServiceNode(id=iaasi-d13534dc, ip=54.212.198.189)]) 2014-09-26 15:44:27,353 DEBUG conpaas.core.manager New uuid for iaasi-d13534dc (dir) -> ff4a63c0-4593-11e4-aa15-22000ae2a921 2014-09-26 15:44:28,549 DEBUG conpaas.core.manager New uuid for iaasi-d13534dc (mrc) -> 0000d7c2-4594-11e4-aa15-22000ae2a921 2014-09-26 15:44:29,760 DEBUG conpaas.core.manager New uuid for iaasi-d13534dc (osd) -> 00b9994c-4594-11e4-aa15-22000ae2a921 2014-09-26 15:44:29,761 INFO conpaas.core.manager Creating a volume named osd-00b9994c-4594-11e4-aa15-22000ae2a921 (1024 MBs) 2014-09-26 15:44:29,761 DEBUG conpaas.core.controller create_volume(cloud=iaas, size=1024, name='osd-00b9994c-4594-11e4-aa15-22000ae2a921') 2014-09-26 15:44:30,133 DEBUG conpaas.core.clouds.base self.driver.create_volume(1, osd-00b9994c-4594-11e4-aa15-22000ae2a921, driver=Amazon EC2 (us-west-2)>) 2014-09-26 15:44:30,479 INFO conpaas.core.manager Attaching volume vol-a81458ad to VM i-d13534dc as vda 2014-09-26 15:44:30,479 DEBUG conpaas.core.controller attach_volume(node=conpaas.core.manager.node, volume=, device=vda) 2014-09-26 15:44:30,729 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:44:40,882 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:44:51,038 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:45:01,316 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:45:11,451 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:45:21,578 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:45:31,714 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:45:41,869 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:45:52,028 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:46:02,195 INFO conpaas.core.clouds.base Volume not available yet 2014-09-26 15:46:12,205 ERROR conpaas.core.clouds.base Volume NOT available after timeout Traceback (most recent call last): File "/root/ConPaaS/src/conpaas/core/clouds/ec2.py", line 135, in attach_volume return self.driver.attach_volume(node, volume, device) File "/root/ConPaaS/contrib/libcloud/compute/drivers/ec2.py", line 708, in attach_volume self.connection.request(self.path, params=params) File "/root/ConPaaS/contrib/libcloud/common/base.py", line 609, in request connection=self) File "/root/ConPaaS/contrib/libcloud/common/base.py", line 93, in init raise Exception(self.parse_error()) Exception: InvalidParameterValue: Value (vda) for parameter device is invalid. vda is not a valid EBS device name.

noma commented 9 years ago

@tcrivat Looks like volume creation was successful, but now attaching the volume fails because "vda" is not allowed as a device name. The default for the device name was recently changed (see #31), which might be the cause for this problem (assuming it worked before). You could try setting DEV_TARGET to "sdb" in the default-manager.cfg, which was the former value.

Here are the device naming rules for EBS, so "sdb" should be fine, but "vda" is not: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-attaching-volume.html

@gtato Was there an actual reason why you wanted to change this? Maybe we can find a default value that works for all clouds. Otherwise we must use a cloud-dependant value (maybe by adding a method to the cloud class for querrying the device name for attached storage).

tcrivat commented 9 years ago

@noma After changing from "vda" to "sdb", the error is gone and the XtreemFS service starts successfully. Thanks a lot!

tcrivat commented 9 years ago

The same problem with "vda" not allowed on Amazon EC2 as a device name also happens to the Galera service. However, in the case of the Galera service, the device name is not read from the default-manager.cfg file, but is hardcoded as "vda". I changed this behavior (so Galera now reads it from the config file) in commit 2efe0b0b484d03991ba34bad8d6c959a5efbc617. Now the Galera service starts successfully on EC2.

@FrancoCaffarraAndEsterDiBello: Could you, please, check commit 2efe0b0b484d03991ba34bad8d6c959a5efbc617?

It still remains to decide on the default value for this device name. The current value is still "vda", which is not supported on EC2.

tcrivat commented 9 years ago

Another problem was that with newer Java Virtual Machine versions, if an IPv6 stack is available, the XtreemFS service will bind to the IPv6 address instead of the IPv4 one. I fixed this in commit 64ee60e370eb0d4113933e34f19c5c7a29fd98e7. @noma: please confirm you are ok with this.

noma commented 9 years ago

If it works, I'm ok with it. ;-) I currently cannot test it myself, since our OpenNebula system is not operational.

tcrivat commented 9 years ago

Hi @noma,

Unfortunately, I'm still out of luck. For some unknown reason, I am not able to mount the xtreemfs volume, not even on the xtreemfs agent node.

The volume can be created successfully (using the web-based frontend):

Listing all volumes of the MRC: 54.190.62.203:32636 Volumes on 54.190.62.203:32636 (Format: volume name -> volume UUID): data -> 93618828-d4c3-42b3-9b23-c93ae9c3f5dc End of List.

However, trying to mount the volume fails with:

root@ip-10-248-34-175:~# mount.xtreemfs localhost/data /var/tmp/data [ E | 9/29 18:23:50.573 | 0x269c7b0 ] Got no response from server localhost:32638, retrying (infinite attempts left) ^Croot@ip-10-248-34-175:~# mount.xtreemfs localhost:32636/data /var/tmp/data [ E | 9/29 18:24:57.753 | 0x1489840 ] Got no response from server localhost:32636, retrying (infinite attempts left) ^C

The service seem to be listening:

root@ip-10-248-34-175:~# netstat -tlnp | grep java tcp 0 0 0.0.0.0:32636 0.0.0.0:* LISTEN 740/java
tcp 0 0 0.0.0.0:32638 0.0.0.0:* LISTEN 721/java
tcp 0 0 0.0.0.0:32640 0.0.0.0:* LISTEN 798/java
tcp 0 0 0.0.0.0:30636 0.0.0.0:* LISTEN 740/java
tcp 0 0 0.0.0.0:30638 0.0.0.0:* LISTEN 721/java
tcp 0 0 0.0.0.0:30640 0.0.0.0:* LISTEN 798/java

Any idea on how to troubleshoot this? Thanks!

noma commented 9 years ago

The mount command expects a DIR, not an MRC, so try one of these:

mount.xtreemfs localhost/data /var/tmp/data (without the explicit port) mount.xtreemfs localhost:32638/data /var/tmp/data (with the DIR port)

tcrivat commented 9 years ago

Hi @noma,

Same thing happens (I actually also tried these before):

root@ip-10-225-165-214:~# mount.xtreemfs localhost/data /var/tmp/data [ E | 9/30 07:30:34.973 | 0x27857b0 ] Got no response from server localhost:32638, retrying (infinite attempts left) ^Croot@ip-10-225-165-214:~# mount.xtreemfs localhost:32638/data /var/tmp/data [ E | 9/30 07:31:26.541 | 0x2d50840 ] Got no response from server localhost:32638, retrying (infinite attempts left)

I think that the fastest way to solve this would be to connect to the VM yourself and have a look. The IP address of the VM running the XtreemFS agent is: X (removed)

You can also see it here, or start another one: X (removed)

noma commented 9 years ago

I just checked out the VMs and they look fine to me. It took some minutes for me to remember that the last big change we did during the Contrail project was using SSL for XtreemFS, I actually documented this in the user guide. Trick question: Has anyone ever read this?: http://conpaas-team.readthedocs.org/en/latest/userguide.html#the-xtreemfs-service

So what I suspect to be the problem is, that the error originates from not using SSL in the mount command. Please try following the instructions in the documentation from the director machine, where you can acquire the necessary certificate for mounting.

If this solves the problem, I will open an issue for XtreemFS to create more helpful output in the error messages.

However, another problem is, that the command line client has much more features than the web-client, so whoever is working on the web front-end needs to add at least the essential options like downloading the certificates for XtreemFS.

tcrivat commented 9 years ago

Yes, using the certificates solves the problem. It would be nice to have XtreemFS warn when certificates are needed but are not used, instead of locking up. Thanks!

gpierre42 commented 9 years ago

This problem looks solved to me. I close the issue.