AAROC / DevOps

DevOps code to deploy eScience services
http://www.africa-grid.org/DevOps
Other
19 stars 40 forks source link

Java Error in liferay-csgf.yml #203

Closed abd-hasan85 closed 9 years ago

abd-hasan85 commented 9 years ago

Hi all, i become the following error

failed: [sgw.asrenorg.net] => {"ansible_job_id": "554021991969.24545", "changed": true, "cmd": "./asadmin --interactive false --passwordfile /home/liferayadmin/passwordfile --user liferayadmin start-domain liferay", "delta": "0:00:01.036717", "end": "2015-03-23 08:18:47.607684", "finished": 1, "rc": 1, "start": "2015-03-23 08:18:46.570967", "warnings": []}
stderr: JVM failed to start: com.sun.enterprise.admin.launcher.GFLauncherException: The server exited prematurely with exit code 1.
Before it died, it produced the following output:
stdout: Command start-domain failed.
<job 554021991969.24545> FAILED on sgw.asrenorg.net

FATAL: all hosts have already failed -- aborting

@brucellino @fmarco76

abd-hasan85 commented 9 years ago

after a new execution for this file i have the following error

TASK: [liferay-csgf | Start the Liferay Domain] *** failed: [sgw.asrenorg.net] => {"ansible_job_id": "139731600538.18775", "changed": true, "cmd": "./asadmin --interactive false --passwordfile /etc/abd/passwordfile --user liferayadmin start-domain liferay", "delta": "0:00:01.071887", "end": "2015-03-23 11:40:42.335911", "finished": 1, "rc": 1, "start": "2015-03-23 11:40:41.264024", "warnings": []} stderr: The Master Password is required to start the domain. No console, no prompting possible. You should either create the domain with --savemasterpassword=true or provide a password file with the --passwordfile option. stdout: Command start-domain failed. <job 139731600538.18775> FAILED on sgw.asrenorg.net

brucellino commented 9 years ago

this looks like a problem with glassfish itself. @fmarco76 can you remind us where these logs are ?

abd-hasan85 commented 9 years ago

failed: [sgw.asrenorg.net] => {"ansible_job_id": "796803850166.22786", "changed": true, "cmd": "./asadmin --interactive false --passwordfile /home/liferayadmin/passwordfile --user liferayadmin start-domain liferay", "delta": "0:00:01.040719", "end": "2015-03-23 12:39:26.758936", "finished": 1, "rc": 1, "start": "2015-03-23 12:39:25.718217", "warnings": []} stderr: Error starting domain liferay. The server exited prematurely with exit code 1. Before it died, it produced the following output:

Invalid initial heap size: -Xms Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. stdout: Waiting for liferay to start Command start-domain failed. <job 796803850166.22786> FAILED on sgw.asrenorg.net

FATAL: all hosts have already failed -- aborting

fmarco76 commented 9 years ago

It seems a problem with the password file. The command executed by ansible refers to /etc/abd/passwordfile but the task we develop was using a different file. Have you modified the file or performed other changes to justify the different file?

brucellino commented 9 years ago

ah ok. You're probably missing the host_vars that define the heap size. In our dev instance, we have

min_jvm_size=1024m max_jvm_size=1024m db-server=vm03.ct.infn.it needs_certificate="false"
`
Those variables need to be defined (and the ones used are the bare minimum ones, you should choose higher values).
of course, `db-server` should be _your_ db-server hostname.
abd-hasan85 commented 9 years ago

also now my inventory file in /etc/ansible/hosts/inventory.dev should be as the following ? [ldap-servers] ldap.asrenorg.net needs_certificate=true

[shibboleth-idps] idp.asrenorg.net needs_certificate=true

[science-gateways] sgw.asrenorg.net needs_certificate=true

[db-servers] min_jvm_size=1024m max_jvm_size=2048m db-server=sgw.asrenorg.net needs_certificate="true"

[identity-all:children] ldap-servers shibboleth-idps science-gateways db-servers

[CentOS-servers:children] identity-all

brucellino commented 9 years ago

Yes, this looks fine. However, note that the group that you've specified your hosts to belong to (identity-all) has the following variables (in group_vars/identity-all) :

# institute metadata
site_name: Catch-All
host_institute:
  name: Africa-Arabia Regional Operations Centre
  url: http://aaroc.github.io

# LDAP variables
server_country: ZA
server_state:
server_location: Bloemfontein
server_organization: CSIR
server_organization_unit: SAGrid
organisation:
  name: SAGrid
  logo: csir_meraka.jpg
ldap_server: ldap.sagrid.ac.za
sgw_admin: brucellino@gmail.com
mail_contact: bbecker@csir.co.za
# IDP variables. these are specifically related to the IdPOpen Web front end.
idp:
  name: Catch-All Identity Provider
  admin_user: Bruce Becker
  admin_email: bbecker@csir.co.za
  metadata_url: https://{{ hostvars[groups['shibboleth-idps'][0]]['ansible_fqdn']}}/idp/shibboleth
  mail_server: smtp.google.com
  header_logo: Logo.jpg

You should copy group_vars/identity-all to something like group_vars/identity-asren to add your institute-specific information, and change your inventory to be :

[ldap-servers]
ldap.asrenorg.net needs_certificate=true

[shibboleth-idps]
idp.asrenorg.net needs_certificate=true

[science-gateways]
sgw.asrenorg.net needs_certificate=true

[db-servers]
min_jvm_size=1024m max_jvm_size=2048m db-server=sgw.asrenorg.net needs_certificate="true"

[identity-asren:children]
ldap-servers
shibboleth-idps
science-gateways
db-servers

[CentOS-servers:children]
identity-asren
abd-hasan85 commented 9 years ago

after edit inventory this error has been appeared :

TASK: [liferay-csgf | Start the Liferay Domain] *** ok: [sgw.asrenorg.net] <job 700298964332.27893> polling, 40s remaining failed: [sgw.asrenorg.net] => {"ansible_job_id": "700298964332.27893", "changed": true, "cmd": "./asadmin --interactive false --passwordfile /home/liferayadmin/passwordfile --user liferayadmin start-domain liferay", "delta": "0:00:11.105828", "end": "2015-03-23 13:02:09.355345", "finished": 1, "rc": 1, "start": "2015-03-23 13:01:58.249517", "warnings": []} stderr: Error starting domain liferay. The server exited prematurely with exit code 0. Before it died, it produced the following output:

Launching GlassFish on Felix platform [#|2015-03-23T11:02:05.651+0000|INFO|glassfish3.1.2|com.sun.enterprise.server.logging.GFFileHandler|_ThreadID=1;_ThreadName=main;|Running GlassFish Version: GlassFish Server Open Source Edition 3.1.2.2 (build 5)|#]

[#|2015-03-23T11:02:05.934+0000|INFO|glassfish3.1.2|org.glassfish.ha.store.spi.BackingStoreFactoryRegistry|_ThreadID=1;_ThreadName=main;|Registered org.glassfish.ha.store.adapter.cache.ShoalBackingStoreProxy for persistence-type = replicated in BackingStoreFactoryRegistry|#]

[#|2015-03-23T11:02:06.761+0000|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.services.impl|_ThreadID=23;_ThreadName=Grizzly-kernel-thread(1);|Grizzly Framework 1.9.50 started in: 103ms - bound to [0.0.0.0:8181]|#]

[#|2015-03-23T11:02:06.763+0000|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.services.impl|_ThreadID=26;_ThreadName=Grizzly-kernel-thread(1);|Grizzly Framework 1.9.50 started in: 39ms - bound to [0.0.0.0:4848]|#]

[#|2015-03-23T11:02:06.830+0000|WARNING|glassfish3.1.2|javax.enterprise.system.core.org.glassfish.javaee.services|_ThreadID=1;_ThreadName=main;|JK configuration file /opt/glassfish/glassfish3/glassfish/domains/liferay/config/glassfish-jk.properties is not found.|#]

[#|2015-03-23T11:02:06.916+0000|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.services.impl|_ThreadID=33;_ThreadName=Grizzly-kernel-thread(1);|Grizzly Framework 1.9.50 started in: 35ms - bound to [0.0.0.0:3700]|#]

[#|2015-03-23T11:02:06.942+0000|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.services.impl|_ThreadID=36;_ThreadName=Grizzly-kernel-thread(1);|Grizzly Framework 1.9.50 started in: 29ms - bound to [0.0.0.0:7676]|#]

[#|2015-03-23T11:02:07.294+0000|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=1;_ThreadName=main;|GlassFish Server Open Source Edition 3.1.2.2 (5) startup time : Felix (5,393ms), startup services(2,560ms), total(7,953ms)|#]

[#|2015-03-23T11:02:07.295+0000|SEVERE|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=1;_ThreadName=main;|Shutting down v3 due to startup exception : No free port within range: 8080=com.sun.enterprise.v3.services.impl.monitor.MonitorableSelectorHandler@3da23132|#]

[#|2015-03-23T11:02:07.355+0000|INFO|glassfish3.1.2|javax.enterprise.system.tools.admin.com.sun.enterprise.v3.admin|_ThreadID=43;_ThreadName=Thread-25;|Server shutdown initiated|#]

[#|2015-03-23T11:02:07.362+0000|INFO|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=43;_ThreadName=Thread-25;|Already stopped, so just returning|#] stdout: Waiting for liferay to start ...........Command start-domain failed. <job 700298964332.27893> FAILED on sgw.asrenorg.net

FATAL: all hosts have already failed -- aborting

abd-hasan85 commented 9 years ago

@fmarco76 password file now is the default

fmarco76 commented 9 years ago

The error is now different. It seems the ports used by glassfish are already used. Is there a different instance running? Are they used by other services

abd-hasan85 commented 9 years ago

no

brucellino commented 9 years ago

probably an already running instance. This is one of the weak points of the automation - we haven't worked around this yet. Check that there is no instance of java running - if so, kill it.

abd-hasan85 commented 9 years ago

on sgw server :

[root@sgw ~]# ps aux | grep java root 26961 0.0 0.0 103244 876 pts/0 S+ 13:08 0:00 grep java

on idp server : [root@idp ~]# ps aux | grep java tomcat 19443 0.8 9.7 1471960 186704 ? Sl 12:25 0:23 /usr/lib/jvm/java/bin/java -Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory -Xmx512m -XX:MaxPermSize=128m -Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory -Xmx512m -XX:MaxPermSize=128m -classpath :/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat6 -Dcatalina.home=/usr/share/tomcat6 -Djava.endorsed.dirs=/usr/share/tomcat6/lib/endorsed -Djava.io.tmpdir=/var/cache/tomcat6/temp -Djava.util.logging.config.file=/usr/share/tomcat6/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start root 28013 0.0 0.0 103244 872 pts/0 S+ 13:10 0:00 grep java

fmarco76 commented 9 years ago

From you log:

[#|2015-03-23T11:02:07.295+0000|SEVERE|glassfish3.1.2|javax.enterprise.system.core.com.sun.enterprise.v3.server|_ThreadID=1;_ThreadName=main;|Shutting down v3 due to startup exception : No free port within range: 8080=com.sun.enterprise.v3.services.impl.monitor.MonitorableSelectorHandler@3da23132|#]

If the port is not used than try again!

abd-hasan85 commented 9 years ago

the same error all servers are new & never used only using playbook

abd-hasan85 commented 9 years ago

tomcat listen to 8080 (!!!!--on IDP server--!!??) & 8080 not used on SGW server we have changed the port from 8080 to 80 in server.xml for tomcat (!!!!--on IDP server--!!??) & run the command again & the error now :

failed: [sgw.asrenorg.net] => {"ansible_job_id": "639936070212.4856", "changed": true, "cmd": ["/sbin/reboot"], "delta": "0:00:00.032566", "end": "2015-03-23 14:04:32.226041", "finished": 1, "rc": 1, "start": "2015-03-23 14:04:32.193475", "warnings": []} stderr: reboot: Need to be root <job 639936070212.4856> FAILED on sgw.asrenorg.net

FATAL: all hosts have already failed -- aborting

TASK: [Inform the user to complete liferay setup] ***** FATAL: no hosts matched or all hosts have already failed -- aborting

Notice:i'm login as root

abd-hasan85 commented 9 years ago

@brucellino @fmarco76 please send me the solution for this error

brucellino commented 9 years ago

Hi @Abdelrahman-Hasan

Sorry to take so take so long. There was a missing sudo in the handler.