ManageIQ / manageiq-pods

ManageIQ on Kubernetes and OpenShift
Apache License 2.0
50 stars 100 forks source link

manageiq deployment using external DB not working #168

Closed hahnn closed 7 years ago

hahnn commented 7 years ago

Hi,

Continuing my tests after commit #164, the manageiq pod is not willing to start correctly:

== Checking memcached:11211 status == memcached:11211 - accepting connections == Checking postgresql:5432 status == postgresql:5432 - accepting connections == Writing encryption key == == Restoring PV data symlinks == /var/www/miq/vmdb/GUID does not exist on PV, skipping /var/www/miq/vmdb/log does not exist on PV, skipping /var/www/miq/vmdb/certs/server.cer does not exist on PV, skipping /var/www/miq/vmdb/certs/server.cer.key does not exist on PV, skipping

The PV exists because the pod writes things on it, so that's not an issue about not existing PV. here below is a part of the content of /persistent after a second start of the pod:

sh-4.2# pwd /persistent
sh-4.2# ls -al
total 16
drwxrwxrwx. 4 root root 4096 Jun 26 10:02 . drwxr-xr-x. 20 root root 4096 Jun 26 10:10 .. drwxr-xr-x. 3 root root 4096 Jun 26 10:02 server-data drwxr-xr-x. 4 root root 4096 Jun 26 09:53 server-deploy sh-4.2# ls -al server-data/var/www/miq/vmdb/log/ total 66
drwxrwxr-x. 2 root root 4096 Jun 26 10:09 . drwxr-xr-x. 3 root root 4096 Jun 26 10:02 .. -rw-r--r--. 1 root root 1098 Jun 26 10:09 api.log -rw-r--r--. 1 root root 66 Jun 26 10:02 audit.log -rw-r--r--. 1 root root 66 Jun 26 10:02 automation.log -rw-r--r--. 1 root root 66 Jun 26 10:02 aws.log -rw-r--r--. 1 root root 66 Jun 26 10:02 azure.log -rw-r--r--. 1 root root 66 Jun 26 10:02 datawarehouse.log -rw-r--r--. 1 root root 677 Jun 26 10:09 evm.log -rw-r--r--. 1 root root 66 Jun 26 10:02 fog.log -rw-r--r--. 1 root root 66 Jun 26 10:02 kubernetes.log -rw-r--r--. 1 root root 49035 Jun 26 10:09 last_settings.txt -rw-r--r--. 1 root root 66 Jun 26 10:02 lenovo.log -rw-r--r--. 1 root root 66 Jun 26 10:02 middleware.log -rw-r--r--. 1 root root 66 Jun 26 10:02 policy.log -rw-r--r--. 1 root root 66 Jun 26 10:02 production.log -rw-r--r--. 1 root root 66 Jun 26 10:02 rhevm.log -rw-r--r--. 1 root root 66 Jun 26 10:02 scvmm.log -rw-r--r--. 1 root root 66 Jun 26 10:02 vim.log -rw-r--r--. 1 root root 66 Jun 26 10:02 websocket.log sh-4.2#

and postgreSQL database is still empty:

You are now connected to database "manageiq" as user "postgres". manageiq=# \dt+ No relations found.

Any idea what's wrong? Seems to me a kind of initialization phase is not performed here...

fbladilo commented 7 years ago

@hahnn Can you post the entire output of "oc logs miq-pod-name", I would like to understand what case is your deployment is processing during the init scripts run. Is this a master build?

If yes, please make sure you are getting the latest possible master images for your deployment, "oc edit template " and modify MIQ pod spec imagePullPolicy :

...
image: ${APPLICATION_IMG_NAME}:${APPLICATION_IMG_TAG}
imagePullPolicy: Always
...
hahnn commented 7 years ago

Here below are the results of the commands you've requested:

# oc get pod
NAME                READY     STATUS    RESTARTS   AGE
manageiq-0          0/1       Running   30         4h
memcached-1-dbn19   1/1       Running   0          4h
# oc logs manageiq-0
== Checking memcached:11211 status ==
memcached:11211 - accepting connections
== Checking postgresql:5432 status ==
postgresql:5432 - accepting connections
== Writing encryption key ==
== Restoring PV data symlinks ==
/var/www/miq/vmdb/GUID does not exist on PV, skipping
/var/www/miq/vmdb/certs/server.cer does not exist on PV, skipping
/var/www/miq/vmdb/certs/server.cer.key does not exist on PV, skipping
# cat miq-template-ext-db.yaml
apiVersion: v1
kind: Template
labels:
  template: manageiq-ext-db
metadata:
  name: manageiq-ext-db
  annotations:
    description: "ManageIQ appliance with persistent storage using a external DB host"
    tags: "instant-app,manageiq,miq"
    iconClass: "icon-rails"
objects:
- apiVersion: v1
  kind: ServiceAccount
  metadata:
    name: miq-anyuid
- apiVersion: v1
  kind: Secret
  metadata:
    name: "${NAME}-secrets"
  stringData:
    pg-password: "${DATABASE_PASSWORD}"
    database-url: "postgresql://${DATABASE_USER}:${DATABASE_PASSWORD}@${DATABASE_SERVICE_NAME}:${DATABASE_PORT}/${DATABASE_NAME}?encoding=utf8&pool=5&wait_timeout=5"
    v2-key: "${V2_KEY}"
- apiVersion: v1
  kind: Service
  metadata:
    annotations:
      description: "Exposes and load balances ManageIQ pods"
      service.alpha.openshift.io/dependencies: '[{"name":"${DATABASE_SERVICE_NAME}","namespace":"","kind":"Service"},{"name":"${MEMCACHED_SERVICE_NAME}","namespace":"","kind":"Service"}]'
    name: ${NAME}
  spec:
    clusterIP: None
    ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 80
    - name: https
      port: 443
      protocol: TCP
      targetPort: 443
    selector:
      name: ${NAME}
- apiVersion: v1
  kind: Route
  metadata:
    name: ${NAME}
  spec:
    host: ${APPLICATION_DOMAIN}
    port:
      targetPort: https
    tls:
      termination: passthrough
    to:
      kind: Service
      name: ${NAME}
- apiVersion: v1
  kind: ImageStream
  metadata:
    name: miq-app
    annotations:
      description: "Keeps track of the ManageIQ image changes"
  spec:
    dockerImageRepository: "${APPLICATION_IMG_NAME}"
- apiVersion: v1
  kind: ImageStream
  metadata:
    name: memcached
    annotations:
      description: "Keeps track of the Memcached image changes"
  spec:
    dockerImageRepository: "${MEMCACHED_IMG_NAME}"
- apiVersion: apps/v1beta1
  kind: "StatefulSet"
  metadata:
    name: ${NAME}
    annotations:
      description: "Defines how to deploy the ManageIQ appliance"
  spec:
    serviceName: "${NAME}"
    replicas: "${APPLICATION_REPLICA_COUNT}"
    template:
      metadata:
        labels:
          name: ${NAME}
        name: ${NAME}
      spec:
        containers:
        - name: manageiq
          image: "${APPLICATION_IMG_NAME}:${APPLICATION_IMG_TAG}"
          livenessProbe:
            tcpSocket:
              port: 443
            initialDelaySeconds: 480
            timeoutSeconds: 3
          readinessProbe:
            httpGet:
              path: /
              port: 443
              scheme: HTTPS
            initialDelaySeconds: 200
            timeoutSeconds: 3
          ports:
          - containerPort: 80
            protocol: TCP
          - containerPort: 443
            protocol: TCP
          volumeMounts:
              -
                name: "${NAME}-server"
                mountPath: "/persistent"
                subPath: "manageiq"
          env:
            -
              name: "APPLICATION_INIT_DELAY"
              value: "${APPLICATION_INIT_DELAY}"
            -
              name: "DATABASE_SERVICE_NAME"
              value: "${DATABASE_SERVICE_NAME}"
            -
              name: "DATABASE_REGION"
              value: "${DATABASE_REGION}"
            -
              name: "DATABASE_URL"
              valueFrom:
                secretKeyRef:
                  name: "${NAME}-secrets"
                  key: "database-url"
            -
              name: "MEMCACHED_SERVER"
              value: "${MEMCACHED_SERVICE_NAME}:11211"
            -
              name: "MEMCACHED_SERVICE_NAME"
              value: "${MEMCACHED_SERVICE_NAME}"
            -
              name: "V2_KEY"
              valueFrom:
                secretKeyRef:
                  name: "${NAME}-secrets"
                  key: "v2-key"
          resources:
            requests:
              memory: "${APPLICATION_MEM_REQ}"
              cpu: "${APPLICATION_CPU_REQ}"
            limits:
              memory: "${APPLICATION_MEM_LIMIT}"
          lifecycle:
            preStop:
              exec:
                command:
                  - /opt/manageiq/container-scripts/sync-pv-data
        serviceAccount: miq-anyuid
        serviceAccountName: miq-anyuid
        terminationGracePeriodSeconds: 90
    volumeClaimTemplates:
      - metadata:
          name: "${NAME}-server"
          annotations:
            # Uncomment this if using dynamic volume provisioning.
            # https://docs.openshift.org/latest/install_config/persistent_storage/dynamically_provisioning_pvs.html
            # volume.alpha.kubernetes.io/storage-class: anything
        spec:
          accessModes: [ ReadWriteMany ]
          resources:
            requests:
              storage: "${APPLICATION_VOLUME_CAPACITY}"
          volumeName: "gluster-default-volume3"
- apiVersion: v1
  kind: "Service"
  metadata:
    name: "${MEMCACHED_SERVICE_NAME}"
    annotations:
      description: "Exposes the memcached server"
  spec:
    ports:
      -
        name: "memcached"
        port: 11211
        targetPort: 11211
    selector:
      name: "${MEMCACHED_SERVICE_NAME}"
- apiVersion: v1
  kind: "DeploymentConfig"
  metadata:
    name: "${MEMCACHED_SERVICE_NAME}"
    annotations:
      description: "Defines how to deploy memcached"
  spec:
    strategy:
      type: "Recreate"
    triggers:
      -
        type: "ImageChange"
        imageChangeParams:
          automatic: true
          containerNames:
            - "memcached"
          from:
            kind: "ImageStreamTag"
            name: "memcached:${MEMCACHED_IMG_TAG}"
      -
        type: "ConfigChange"
    replicas: 1
    selector:
      name: "${MEMCACHED_SERVICE_NAME}"
    template:
      metadata:
        name: "${MEMCACHED_SERVICE_NAME}"
        labels:
          name: "${MEMCACHED_SERVICE_NAME}"
      spec:
        volumes: []
        containers:
          -
            name: "memcached"
            image: "${MEMCACHED_IMG_NAME}:${MEMCACHED_IMG_TAG}"
            ports:
              -
                containerPort: 11211
            readinessProbe:
              timeoutSeconds: 1
              initialDelaySeconds: 5
              tcpSocket:
                port: 11211
            livenessProbe:
              timeoutSeconds: 1
              initialDelaySeconds: 30
              tcpSocket:
                port: 11211
            volumeMounts: []
            env:
              -
                name: "MEMCACHED_MAX_MEMORY"
                value: "${MEMCACHED_MAX_MEMORY}"
              -
                name: "MEMCACHED_MAX_CONNECTIONS"
                value: "${MEMCACHED_MAX_CONNECTIONS}"
              -
                name: "MEMCACHED_SLAB_PAGE_SIZE"
                value: "${MEMCACHED_SLAB_PAGE_SIZE}"
            resources:
              requests:
                memory: "${MEMCACHED_MEM_REQ}"
                cpu: "${MEMCACHED_CPU_REQ}"
              limits:
                memory: "${MEMCACHED_MEM_LIMIT}"
- apiVersion: v1
  kind: "Service"
  metadata:
    name: "${DATABASE_SERVICE_NAME}"
    annotations:
      description: "Remote database service"
  spec:
    ports:
      -
        name: "postgresql"
        port: 5432
        targetPort: ${{DATABASE_PORT}}
    selector: {}
- apiVersion: v1
  kind: "Endpoints"
  metadata:
    name: "${DATABASE_SERVICE_NAME}"
  subsets:
    -
      addresses:
        -
          ip: "${DATABASE_IP}"
      ports:
        -
          port: ${{DATABASE_PORT}}
          name: "postgresql"
parameters:
  -
    name: "NAME"
    displayName: Name
    required: true
    description: "The name assigned to all of the frontend objects defined in this template."
    value: manageiq
  -
    name: "V2_KEY"
    displayName: "ManageIQ Encryption Key"
    required: true
    description: "Encryption Key for ManageIQ Passwords"
    from: "[a-zA-Z0-9]{43}"
    generate: expression
  -
    name: "DATABASE_SERVICE_NAME"
    displayName: "PostgreSQL Service Name"
    required: true
    description: "The name of the OpenShift Service exposed for the PostgreSQL container."
    value: "postgresql"
  -
    name: "DATABASE_USER"
    displayName: "PostgreSQL User"
    required: true
    description: "PostgreSQL user that will access the database."
    value: "root"
  -
    name: "DATABASE_PASSWORD"
    displayName: "PostgreSQL Password"
    required: true
    description: "Password for the PostgreSQL user."
    from: "[a-zA-Z0-9]{8}"
    generate: expression
  -
    name: "DATABASE_IP"
    displayName: "PostgreSQL Server IP"
    required: true
    description: "PostgreSQL external server IP used to configure service."
    value: ""
  -
    name: "DATABASE_PORT"
    displayName: "PostgreSQL Server Port"
    required: true
    description: "PostgreSQL external server port used to configure service."
    value: "5432"
  -
    name: "DATABASE_NAME"
    required: true
    displayName: "PostgreSQL Database Name"
    description: "Name of the PostgreSQL database accessed."
    value: "vmdb_production"
  -
    name: "DATABASE_REGION"
    required: true
    displayName: "Application Database Region"
    description: "Database region that will be used for application."
    value: "0"
  -
    name: "MEMCACHED_SERVICE_NAME"
    required: true
    displayName: "Memcached Service Name"
    description: "The name of the OpenShift Service exposed for the Memcached container."
    value: "memcached"
  -
    name: "MEMCACHED_MAX_MEMORY"
    displayName: "Memcached Max Memory"
    description: "Memcached maximum memory for memcached object storage in MB."
    value: "64"
  -
    name: "MEMCACHED_MAX_CONNECTIONS"
    displayName: "Memcached Max Connections"
    description: "Memcached maximum number of connections allowed."
    value: "1024"
  -
    name: "MEMCACHED_SLAB_PAGE_SIZE"
    displayName: "Memcached Slab Page Size"
    description: "Memcached size of each slab page."
    value: "1m"
  -
    name: "APPLICATION_CPU_REQ"
    displayName: "Application Min CPU Requested"
    required: true
    description: "Minimum amount of CPU time the Application container will need (expressed in millicores)."
    value: "1000m"
  -
    name: "MEMCACHED_CPU_REQ"
    displayName: "Memcached Min CPU Requested"
    required: true
    description: "Minimum amount of CPU time the Memcached container will need (expressed in millicores)."
    value: "200m"
  -
    name: "APPLICATION_MEM_REQ"
    displayName: "Application Min RAM Requested"
    required: true
    description: "Minimum amount of memory the Application container will need."
    value: "6144Mi"
  -
    name: "MEMCACHED_MEM_REQ"
    displayName: "Memcached Min RAM Requested"
    required: true
    description: "Minimum amount of memory the Memcached container will need."
    value: "64Mi"
  -
    name: "APPLICATION_MEM_LIMIT"
    displayName: "Application Max RAM Limit"
    required: true
    description: "Maximum amount of memory the Application container can consume."
    value: "16384Mi"
  -
    name: "MEMCACHED_MEM_LIMIT"
    displayName: "Memcached Max RAM Limit"
    required: true
    description: "Maximum amount of memory the Memcached container can consume."
    value: "256Mi"
  -
    name: "MEMCACHED_IMG_NAME"
    displayName: "Memcached Image Name"
    description: "This is the Memcached image name requested to deploy."
    value: "docker.io/manageiq/memcached"
  -
    name: "MEMCACHED_IMG_TAG"
    displayName: "Memcached Image Tag"
    description: "This is the Memcached image tag/version requested to deploy."
    value: "latest"
  -
    name: "APPLICATION_IMG_NAME"
    displayName: "Application Image Name"
    description: "This is the Application image name requested to deploy."
    value: "docker.io/manageiq/manageiq-pods"
  -
    name: "APPLICATION_IMG_TAG"
    displayName: "Application Image Tag"
    description: "This is the Application image tag/version requested to deploy."
    value: "app-latest"
  -
    name: "APPLICATION_DOMAIN"
    displayName: "Application Hostname"
    description: "The exposed hostname that will route to the application service, if left blank a value will be defaulted."
    value: ""
  -
    name: "APPLICATION_REPLICA_COUNT"
    displayName: "Application Replica Count"
    description: "This is the number of Application replicas requested to deploy."
    value: "1"
  -
    name: "APPLICATION_INIT_DELAY"
    displayName: "Application Init Delay"
    required: true
    description: "Delay in seconds before we attempt to initialize the application."
    value: "15"
  -
    name: "APPLICATION_VOLUME_CAPACITY"
    displayName: "Application Volume Capacity"
    required: true
    description: "Volume space available for application data."
    value: "5Gi"
fbladilo commented 7 years ago

@hahnn Output seems incomplete, do me a favor, go ahead and change the imagePullPolicy for the MIQ pod spec as I noted in my previous comment, want to ensure you are deploying the latest possible master images for miq pod.

I would actually update the external template with the updated imagePullPolicy, delete the existing deployment and start a new one for freshness.

For example, your entire oc logs output should look similar to for a fresh new deployment :

oc logs manageiq-0
== Checking memcached:11211 status ==
memcached:11211 - accepting connections
== Checking postgresql:5432 status ==
postgresql:5432 - accepting connections
== Writing encryption key ==
== Restoring PV data symlinks ==
/var/www/miq/vmdb/GUID does not exist on PV, skipping
/var/www/miq/vmdb/log does not exist on PV, skipping
/var/www/miq/vmdb/certs/server.cer does not exist on PV, skipping
/var/www/miq/vmdb/certs/server.cer.key does not exist on PV, skipping
Deployment status is new_deployment
== Starting New Deployment ==
Generating a 2048 bit RSA private key
.....+++
..............................................................+++
writing new private key to '/var/www/miq/vmdb/certs/server.cer.key'
-----
== Initializing Appliance ==
/var/www/miq/vmdb /var/www/miq/vmdb
Writing region: 0 in /var/www/miq/vmdb/REGION...
Resetting production database...
Dropped database 'vmdb_production'
Created database 'vmdb_production'
.Initializing region and database...
/var/www/miq/vmdb
.....

So something is not taking place, trying to figure out what is missing.

hahnn commented 7 years ago

Please note that in the template reproduced above, I've updated the persistent volume to suit my environment:

fbladilo commented 7 years ago

@hahnn Seems ok at a glance, I'm a bit confused I don't the see the call in your oc log output to check_deployment_status :

https://github.com/ManageIQ/manageiq-pods/blob/master/images/miq-app/docker-assets/appliance-initialize.sh#L25

This function determines the path to follow for your deployment.

hahnn commented 7 years ago

currently deleted my full manageiq project (as well as performed some 'docker rmi' commands to also delete the images) and recreated it from scratch.

Here are all the steps after deletion:

# oc version
oc v3.6.0-alpha.2+3c221d5
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://127.0.0.1:8443
openshift v3.6.0-alpha.2+3c221d5
kubernetes v1.6.1+5115d708d7
# oc whoami
system:admin
# oc new-project manageiq --display-name="ManageIQ"
Already on project "manageiq" on server "https://127.0.0.1:8443".

You can add applications to this project with the 'new-app' command. For example, try:

    oc new-app centos/ruby-22-centos7~https://github.com/openshift/ruby-ex.git

to build a new example application in Ruby.
# oc describe scc anyuid | grep Users
  Users:                    system:serviceaccount:gitlab:gitlab-ce-user,system:serviceaccount:manageiq:miq-anyuid
# oc create -f /tmp/gluster-service.yaml
service "glusterfs-cluster" created
# oc create -f /tmp/gluster-endpoints.yaml 
endpoints "glusterfs-cluster" created
# oc create -f /tmp/gluster-pv3.yaml       
persistentvolume "gluster-default-volume3" created
# oc get pv | grep volume3
gluster-default-volume3   15Gi       RWX           Retain          Available                                                                  59s
# oc create -f /tmp/miq-template-ext-db.yaml 
template "manageiq-ext-db" created

Then using the OpenShift Web Console, click 'Add to project' button in order to create the app. In the web console, I changed some parameters mostly related to the DB as below:

All other parameters left to default values. That means:

Right after the pod got its running status, i opened a terminal on it to try to cat some files if it can help you:

# cat /etc/default/evm
#!/bin/bash
# Description: Sets the environment for scripts and console users
#

export RAILS_ENV=production
export APPLIANCE=true
export HOME=${HOME:-/root}
# Force ExecJS to use node
export EXECJS_RUNTIME='Node'
# workaround for virtual memory spike observed with RHEL6
export MALLOC_ARENA_MAX=1
# Location of certificates and provider keys
export KEY_ROOT=/var/www/miq/vmdb/certs        

export APPLIANCE_SOURCE_DIRECTORY=/opt/manageiq/manageiq-appliance
export APPLIANCE_TEMPLATE_DIRECTORY=${APPLIANCE_SOURCE_DIRECTORY}/TEMPLATE

[[ -s /etc/default/evm_ruby ]] && source /etc/default/evm_ruby
[[ -s /etc/default/evm_bundler ]] && source /etc/default/evm_bundler
[[ -s /etc/default/evm_postgres ]] && source /etc/default/evm_postgres
[[ -s /etc/default/evm_productization ]] && source /etc/default/evm_productization

# Force locale
export LANGUAGE=en_US.UTF-8
export LANG=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8

After this new deployment, this is still the same problem.

fbladilo commented 7 years ago

@hahnn Looks good, the external DB deployment expects your remote PG server to have the appropiate user, role and password created as you requested on the template. We have no control over that since it is a remote PG server, needs to be accomplished prior deployment.

Can you connect to the remote PG server with the manageiq user/password from the application pod?

Please keep in mind the SuperUser role is also required for this manageiq user on PG server.

Check below in how we do it for the internal PG database when deployed on OpenShift :

https://github.com/ManageIQ/container-postgresql/blob/master/docker-assets/run-postgresql#L12

In your case you will have to substitute with your manageiq user and password on the remote postgresql server. The init scripts will actually create the database requested if credentials are properly setup, in this case you changed the default db name from "vmdb_production" to "manageiq".

hahnn commented 7 years ago

I can confirm the connection is Ok on the remote postgresql server because the postgresql check launched when the pod starts confirm a success, and I can see the connection from the pod in the logs of my PostgreSQL server.

But what do you mean about super user role on PG server? The manageiq account for the database needs postgres super admin rights??? If this is the case, that's really something we are reluctant to implement!

Right now, the database is created (but is empty) and is owned by the manageiq DB user (but this user is a unprivileged postgresql user account).

I'll continue testing tomorrow as I'm at home now :)

fbladilo commented 7 years ago

@hahnn No problem 👍 The PG check only does a generic TCP check if the database is available at the port, does not check credentials.

https://github.com/ManageIQ/manageiq-pods/blob/master/images/miq-app/docker-assets/container-scripts/container-deploy-common.sh#L33

The super user rights are a requirement by Manageiq perhaps @carbonin can help explain more on why 😃

hahnn commented 7 years ago

mmhhhh.... are those super admin rights needed only the first time the pod starts in order to configure the DB or is it required permanently? I would really suggest to avoid defining a DB account with super admin rights for an application, even if this application is ManageIQ... will not pass our security requirements.

carbonin commented 7 years ago

I'm thinking the main reason for this is that rails apps typically drop and recreate the database (which is what we do during initialization). This will mean when we drop the database we need to connect to another one to recreate it and this is hardcoded (in rails, not in our app) to the postgres database.

I know we have users that don't use SUPERUSER so it can be done, but I'm not sure I can come up with all the usecases and permissions we may need off the top of my head in this comment.

As a start to try this out I think you should be okay with CREATEDB and LOGIN, but would also need to configure the role to have access to the postgres database. Can you give that a try and I'll write here if I hear of other use cases for SUPERUSER?

hahnn commented 7 years ago

i'll try tomorrow morning first hour :-) And will write here if that will be enough to unblock things.

Anyway, I want to thank both of you for your pro-active answers and support ;-)

hahnn commented 7 years ago

OK. I altered the attributes of the manageiq database user to have superuser rights.

But the issue is still there, except there are more logs from the pod:

# oc logs manageiq-0                               
== Checking memcached:11211 status ==
memcached:11211 - accepting connections
== Checking postgresql:5432 status ==
postgresql:5432 - accepting connections
== Writing encryption key ==
== Restoring PV data symlinks ==
/var/www/miq/vmdb/GUID does not exist on PV, skipping
/var/www/miq/vmdb/certs/server.cer does not exist on PV, skipping
/var/www/miq/vmdb/certs/server.cer.key does not exist on PV, skipping
{"@timestamp":"2017-06-27T07:12:35.176392 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for evm.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.178891 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for vim.log has been changed to [WARN]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.180955 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for rhevm.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.182124 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for aws.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.183132 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for kubernetes.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.185190 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for middleware.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.186396 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for datawarehouse.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.187834 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for scvmm.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.188836 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for api.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.190011 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for fog.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.191002 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for azure.log has been changed to [WARN]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.193686 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for lenovo.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:35.195592 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for websocket.log has been changed to [INFO]","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:37.289947 ","hostname":"manageiq-0","level":"info","message":"MIQ(SessionStore) Using session_store: ActionDispatch::Session::MemCacheStore","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.181788 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Initializer.init) - Program Name: bin/rake, PID: 24, ENV['MIQ_GUID']: , ENV['EVMSERVER']: ","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.792905 ","hostname":"manageiq-0","level":"info","message":"Initializing Environment for API","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.793556 ","hostname":"manageiq-0","level":"info","message":"","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.794113 ","hostname":"manageiq-0","level":"info","message":"Static Configuration","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.794592 ","hostname":"manageiq-0","level":"info","message":"  module                  : api","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.795040 ","hostname":"manageiq-0","level":"info","message":"  name                    : API","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.795531 ","hostname":"manageiq-0","level":"info","message":"  description             : REST API","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.796995 ","hostname":"manageiq-0","level":"info","message":"  version                 : 3.0.0-pre","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.797721 ","hostname":"manageiq-0","level":"info","message":"","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.798435 ","hostname":"manageiq-0","level":"info","message":"Dynamic Configuration","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.829901 ","hostname":"manageiq-0","level":"info","message":"  token_ttl               : 10.minutes","pid":24,"tid":"3ffab5741130","service":null}
{"@timestamp":"2017-06-27T07:12:38.830399 ","hostname":"manageiq-0","level":"info","message":" authentication_timeout : 30.seconds","pid":24,"tid":"3ffab5741130","service":null} 

Something wrong in bin/rake evm:deployment_status ?

I'm also wondering if the manageiq pod would need we define environment variables for our proxy (because all our OpenShift infra must pass via a proxy for HTTP/HTTPS). Does this pod need to go outside?

hahnn commented 7 years ago

This morning, I've seen a difference in the log output of the miq pod, certainly because of commit #162. certificates are generated, and after that the issue is still there:

== Checking memcached:11211 status ==
memcached:11211 - accepting connections
== Checking postgresql:5432 status ==
postgresql:5432 - accepting connections
== Writing encryption key ==
== Restoring PV data symlinks ==
/var/www/miq/vmdb/GUID does not exist on PV, skipping
Generating a 2048 bit RSA private key
...................................................................................................................................................+++
..+++
writing new private key to '/var/www/miq/vmdb/certs/server.cer.key'
-----
{"@timestamp":"2017-06-28T07:43:38.726573 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for evm.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.728349 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for vim.log has been changed to [WARN]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.729234 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for rhevm.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.729903 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for aws.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.730495 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for kubernetes.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.731094 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for middleware.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.731711 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for datawarehouse.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.732395 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for scvmm.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.733444 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for api.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.734157 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for fog.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.734672 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for azure.log has been changed to [WARN]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.735330 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for lenovo.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:38.735894 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Loggers.apply_config) Log level for websocket.log has been changed to [INFO]","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:39.399320 ","hostname":"manageiq-0","level":"info","message":"MIQ(SessionStore) Using session_store: ActionDispatch::Session::MemCacheStore","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.221575 ","hostname":"manageiq-0","level":"info","message":"MIQ(Vmdb::Initializer.init) - Program Name: bin/rake, PID: 26, ENV['MIQ_GUID']: , ENV['EVMSERVER']: ","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.614201 ","hostname":"manageiq-0","level":"info","message":"Initializing Environment for API","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.615633 ","hostname":"manageiq-0","level":"info","message":"","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.617231 ","hostname":"manageiq-0","level":"info","message":"Static Configuration","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.618084 ","hostname":"manageiq-0","level":"info","message":"  module                  : api","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.618849 ","hostname":"manageiq-0","level":"info","message":"  name                    : API","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.619608 ","hostname":"manageiq-0","level":"info","message":"  description             : REST API","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.620410 ","hostname":"manageiq-0","level":"info","message":"  version                 : 3.0.0-pre","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.623096 ","hostname":"manageiq-0","level":"info","message":"","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.625372 ","hostname":"manageiq-0","level":"info","message":"Dynamic Configuration","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.667004 ","hostname":"manageiq-0","level":"info","message":"  token_ttl               : 10.minutes","pid":26,"tid":"3f92a7d9712c","service":null}
{"@timestamp":"2017-06-28T07:43:41.667647 ","hostname":"manageiq-0","level":"info","message":"  authentication_timeout  : 30.seconds","pid":26,"tid":"3f92a7d9712c","service":null}
carbonin commented 7 years ago

@hahnn I'm working through a few issues on the master branch. We're in the process of implementing quite a few new features. If you're just interested in getting things up and running, I would suggest using a tagged release rather than master.

hahnn commented 7 years ago

Yeah... I see httpd and ansible pods now. It seems you go more and more from monolithic to micro-services. Which is good :)

carbonin commented 7 years ago

After https://github.com/ManageIQ/manageiq-pods/pull/172 was merged I was able to get master working with an external database. I didn't try anything with more restrictive database role permissions though.

It may be worth noting here that for the first setup the manageiq pod takes a few minutes to initialize the database. If there is significant latency between the pod and wherever the database is running it might be even longer than that (but hopefully not the full 10 minutes)

hahnn commented 7 years ago

OK, good to know. I'll give some tries tomorrow and let you know.

hahnn commented 7 years ago

OK. I tried the miq-template.yaml, because had no results with external DB. But I really have bad results: Basically, the only pod that can start is the memcached. httpd, manageiq-ansible and postgresql pods in particular stay light blue for a while and end with a deployment failed red error after 10 minutes. manageiq pod stays light blue and try to restart permanently every 10 minutes.

here below are database logs:

# tail -f postgresql-Thu.log 
FATAL:  the database system is starting up
LOG:  database system was shut down at 2017-06-29 08:35:40 UTC
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  database system was interrupted; last known up at 2017-06-29 08:36:01 UTC
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  invalid record length at 0/1714D88
LOG:  redo is not required
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2017-06-29 08:36:41 UTC
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started
FATAL:  database "vmdb_production" does not exist
FATAL:  database "vmdb_production" does not exist
FATAL:  database "vmdb_production" does not exist
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  incomplete startup packet
FATAL:  database "vmdb_production" does not exist
LOG:  received smart shutdown request
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down

Here below are manageiq logs:

== Checking memcached:11211 status ==
memcached:11211 - accepting connections
== Checking postgresql:5432 status ==
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.
Ncat: Connection timed out.

It's strange manageiq cannot connect on postgresql because I was able to connect to postgres inside its pod.

httpd pod logs:

Generating a 2048 bit RSA private key
.....................................................................................................................................................+++
...............................................................+++
writing new private key to '/etc/httpd/certs/server.cer.key'
-----
[Thu Jun 29 09:22:55.104476 2017] [so:warn] [pid 11] AH01574: module ssl_module is already loaded, skipping
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 172.17.0.3. Set the 'ServerName' directive globally to suppress this message

During deployment attempt once all pods were in light blue and before all failed, I listed the services and endpoints:

# oc get service
NAME                CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
ansible             172.30.118.68    <none>        80/TCP,443/TCP   22m
glusterfs-cluster   172.30.153.79    <none>        1/TCP            24m
httpd               172.30.131.95    <none>        80/TCP,443/TCP   22m
manageiq            None             <none>        80/TCP,443/TCP   22m
memcached           172.30.202.243   <none>        11211/TCP        22m
postgresql          172.30.162.109   <none>        5432/TCP         22m
# oc get endpoints
NAME                ENDPOINTS                       AGE
ansible             <none>                          22m
glusterfs-cluster   10.150.7.245:1,10.150.7.246:1   25m
httpd               <none>                          22m
manageiq                                            22m
memcached           172.17.0.17:11211               22m
postgresql                                          22m

There should be an endpoint for postgresql, right?

I don't really know where to look for from there to help you (and myself also).

Anyway I'm not far away to think that my OpenShift test VM have not enough resources or maybe it fails because the VM is with OpenShift Origin 3.6.alpha.2. On the other hand, I don't have problems with all my other projects and pods.

hahnn commented 7 years ago

Ah... lauching deployment of the postgresql pod once more time, it finally started and I can see the vmdb_production database. If i understand there should be an ansible DB also that is not there at this time. Continuing my effort on the ansible pod.

hahnn commented 7 years ago

I don't see any ansible database created in the postgresql pod. the manageiq-ansible logs are as below:

# oc logs manageiq-ansible-2-3f38j
== Checking postgresql:5432 status ==
postgresql:5432 - accepting connections

this pod seems to get stuck

carbonin commented 7 years ago

this pod seems to get stuck

No, that looks fine. That pod is not sending its output to STDOUT so that's all you would see.

hahnn commented 7 years ago

ok. So it keeps its light blue color for 5 minutes, then orange color for 5 minutes, and restarting. Same fot httpd pod, but this one wait systematically for a redeploy. There are a lot of more logs permanently appearing in the manageiq pod, clearly showing it does lot more things than before, but this one also keep restarting permanently after light blue/orange color cycle.

There is no other DB than the vmdb_production database in the postgresql pod (except template DBs and postgres DB).

hahnn commented 7 years ago

I'll investigate things from the point of view of our proxied context. Maybe there is something disturbing on this side.

the http, ansible and manageiq pods in particular have their readiness and liveness probes in failure. that's why they are permanently restarted...

For example, the httpd pod has a route in my lab which is manageiq.apps.caas.world. If I curl that, the answer I get is as below:

# curl --noproxy '*' --insecure https://manageiq.apps.caas.world:443/
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/">GET&nbsp;/</a></em>.<p>
Reason: <strong>DNS lookup failure for: manageiq</strong></p></p>
</body></html>

I've checked the /etc/httpd/conf.d/manageiq.conf file in the pod, there are proxy and reverse proxy rules defined. I don't know if this can be where is my issue in my context.

hahnn commented 7 years ago

I finally put my finger on the problem. I explain what I found.

[Fri Jun 30 10:51:35.043956 2017] [so:warn] [pid 135] AH01574: module ssl_module is already loaded, skipping (2)No such file or directory: AH02291: Cannot access directory '/var/www/miq/vmdb/log/apache/' for main error log (2)No such file or directory: AH02291: Cannot access directory '/var/www/miq/vmdb/log/apache/' for error log of vhost defined at /etc/httpd/conf.d/manageiq-https-application.conf:3 AH00014: Configuration check failed

What I don't know at this time is whether or not the fact the apache directory doesn't exist in the /var/www/miq/vmdb/log/ is a bug in your current implementation of the manageiq pod, or if that was due to the numerous problems I encountered in testing ManageIQ.

Whatever it is, I would strongly suggest the existence of this apache directory to be tested in order to create it if needed, before starting the HTTPD processes.

In order to debug all of this, I had to disable all readiness/liveness probes implemented in http, embedded-ansible and manageiq pods, to keep them running.

Now, I've the ManageIQ web interface displayed.

carbonin commented 7 years ago

On first boot of a new manageiq pod from master I see that directory in the /persistent volume:

sh-4.2# ls -l /persistent/server-data/var/www/miq/vmdb/log/
total 468
drwxr-xr-x 2 root root   4096 Jun 30 15:13 apache
-rw-r--r-- 1 root root   4183 Jun 30 16:02 api.log
-rw-r--r-- 1 root root     66 Jun 30 16:01 audit.log
-rw-r--r-- 1 root root     66 Jun 30 16:01 automation.log
...

Additionally after waiting a few minutes I see everything come up successfully on the manageiq pod

[root@manageiq-0 vmdb]# ps -leaf --forest
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
4 S root       401     0  0  80   0 -  3801 -      16:07 ?        00:00:00 /bin/sh
0 S root       408   401  0  80   0 -  3802 -      16:07 ?        00:00:00  \_ bash
0 R root       417   408  0  80   0 - 12721 -      16:07 ?        00:00:00      \_ ps -leaf --forest
4 S root         1     0  0  80   0 -    50 -      16:01 ?        00:00:00 /usr/local/bin/dumb-init --single-child -- entrypoint
4 S root         7     1 20  80   0 - 137737 -     16:01 ?        00:01:22 MIQ Server
1 S root       193     7  0  90  10 - 132340 -     16:03 ?        00:00:01  \_ MIQ: MiqGenericWorker id: 1, queue: generic
1 S root       201     7  0  90  10 - 132083 -     16:03 ?        00:00:01  \_ MIQ: MiqGenericWorker id: 2, queue: generic
1 S root       209     7  0  81   1 - 132083 -     16:03 ?        00:00:02  \_ MIQ: MiqPriorityWorker id: 3, queue: generic
1 S root       217     7  0  81   1 - 132340 -     16:03 ?        00:00:02  \_ MIQ: MiqPriorityWorker id: 4, queue: generic
1 S root       225     7  1  83   3 - 133368 -     16:03 ?        00:00:02  \_ MIQ: MiqScheduleWorker id: 5
1 S root       241     7  0  87   7 - 134396 -     16:04 ?        00:00:01  \_ MIQ: MiqEventHandler id: 6, queue: ems
1 S root       249     7  0  87   7 - 134396 -     16:04 ?        00:00:01  \_ MIQ: MiqReportingWorker id: 7, queue: reporting
1 S root       257     7  0  87   7 - 134653 -     16:04 ?        00:00:00  \_ MIQ: MiqReportingWorker id: 8, queue: reporting
1 S root       267     7  0  81   1 - 135424 -     16:04 ?        00:00:01  \_ puma 3.3.0 (tcp://127.0.0.1:5000) [MIQ: Web Server Worker]
1 S root       276     7  1  81   1 - 135938 -     16:04 ?        00:00:02  \_ puma 3.3.0 (tcp://127.0.0.1:3000) [MIQ: Web Server Worker]
1 S root       284     7  0  81   1 - 136195 -     16:04 ?        00:00:00  \_ puma 3.3.0 (tcp://127.0.0.1:4000) [MIQ: Web Server Worker]
1 S root        10     1  0  80   0 -  6512 -      16:01 ?        00:00:00 /usr/sbin/crond
4 S root       183     1  0  80   0 - 62464 -      16:03 ?        00:00:00 /usr/sbin/httpd -DFOREGROUND
5 S apache     184   183  0  80   0 - 62699 -      16:03 ?        00:00:00  \_ /usr/sbin/httpd -DFOREGROUND
5 S apache     185   183  0  80   0 - 62530 -      16:03 ?        00:00:00  \_ /usr/sbin/httpd -DFOREGROUND
5 S apache     186   183  0  80   0 - 62594 -      16:03 ?        00:00:00  \_ /usr/sbin/httpd -DFOREGROUND
5 S apache     187   183  0  80   0 - 62563 -      16:03 ?        00:00:00  \_ /usr/sbin/httpd -DFOREGROUND
5 S apache     188   183  0  80   0 - 62558 -      16:03 ?        00:00:00  \_ /usr/sbin/httpd -DFOREGROUND
5 S apache     354   183  0  80   0 - 62596 -      16:04 ?        00:00:00  \_ /usr/sbin/httpd -DFOREGROUND

It seems like this was a combination of an issue with master that has since been resolved and some environmental issue. I'm going to go ahead and close this.