Unable to login - UI<->DB connectivity issues

ToroNZ commented 6 years ago

If you are reporting a problem, please make sure the following information are provided: 1)Version of docker engine and docker-compose.

[root@utl01 harbor]# docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.5
 Git commit:   9ee9f40
 Built:        Thu Apr 26 07:20:16 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.5
  Git commit:   9ee9f40
  Built:        Thu Apr 26 07:23:58 2018
  OS/Arch:      linux/amd64
  Experimental: false
[root@utl01 harbor]# docker-compose version
docker-compose version 1.9.0, build 2585387
docker-py version: 1.10.6
CPython version: 2.7.5
OpenSSL version: OpenSSL 1.0.2k-fips  26 Jan 2017
[root@utl01 harbor]#

2)Config files of harbor, you can get them by packaging "harbor.cfg" and files in the same directory, including subdirectory.

## Configuration file of Harbor

#This attribute is for migrator to detect the version of the .cfg file, DO NOT MODIFY!
_version = 1.5.0
#The IP address or hostname to access admin UI and registry service.
#DO NOT use localhost or 127.0.0.1, because Harbor needs to be accessed by external clients.
hostname = harbor.example.com:5140

#The protocol for accessing the UI and token/notification service, by default it is http.
#It can be set to https if ssl is enabled on nginx.
ui_url_protocol = https

#Maximum number of job workers in job service
max_job_workers = 50

#Determine whether or not to generate certificate for the registry's token.
#If the value is on, the prepare script creates new root cert and private key
#for generating token to access the registry. If the value is off the default key/cert will be used.
#This flag also controls the creation of the notary signer's cert.
customize_crt = on

#The path of cert and key files for nginx, they are applied only the protocol is set to https
ssl_cert = /etc/pki/tls/certs/utl01.example.com_harbor.crt
ssl_cert_key = /etc/pki/tls/private/utl01.example.com_harbor.key

#The path of secretkey storage
secretkey_path = /data

#Admiral's url, comment this attribute, or set its value to NA when Harbor is standalone
admiral_url = NA

#Log files are rotated log_rotate_count times before being removed. If count is 0, old versions are removed rather than rotated.
log_rotate_count = 2
#Log files are rotated only if they grow bigger than log_rotate_size bytes. If size is followed by k, the size is assumed to be in kilobytes.
#If the M is used, the size is in megabytes, and if G is used, the size is in gigabytes. So size 100, size 100k, size 100M and size 100G
#are all valid.
log_rotate_size = 200M

#Config http proxy for Clair, e.g. http://my.proxy.com:3128
#Clair doesn't need to connect to harbor ui container via http proxy.
http_proxy = "http://10.x.x.x:3128"
https_proxy = "http://10.x.x.x:3128"
no_proxy = 127.0.0.1,localhost,ui,.example.com

#NOTES: The properties between BEGIN INITIAL PROPERTIES and END INITIAL PROPERTIES
#only take effect in the first boot, the subsequent changes of these properties
#should be performed on web ui

#************************BEGIN INITIAL PROPERTIES************************

#Email account settings for sending out password resetting emails.

#Email server uses the given username and password to authenticate on TLS connections to host and act as identity.
#Identity left blank to act as username.
email_identity = harbor@example.com

email_server = smtp.example.com
email_server_port = 25
email_username = ""
email_password = ""
email_from = admin <harbor@example.com>
email_ssl = false
email_insecure = true

##The initial password of Harbor admin, only works for the first time when Harbor starts.
#It has no effect after the first launch of Harbor.
#Change the admin password from UI after launching Harbor.
harbor_admin_password = xxxxxxxxxxxxxxxxxxxxxxxx

##By default the auth mode is db_auth, i.e. the credentials are stored in a local database.
#Set it to ldap_auth if you want to verify a user's credentials against an LDAP server.
#auth_mode = db_auth
auth_mode = ldap_auth

#The url for an ldap endpoint.
ldap_url = ldaps://pidm01.example.com

#A user's DN who has the permission to search the LDAP/AD server.
#If your LDAP/AD server does not support anonymous search, you should configure this DN and ldap_search_pwd.
ldap_searchdn = uid=searcher,cn=sysaccounts,cn=etc,dc=example,dc=com

#the password of the ldap_searchdn
ldap_search_pwd = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

#The base DN from which to look up a user in LDAP/AD
ldap_basedn = cn=users,cn=accounts,dc=example,dc=com

#Search filter for LDAP/AD, make sure the syntax of the filter is correct.
ldap_filter = (memberOf=cn=admins,cn=groups,cn=accounts,dc=example,dc=com)

# The attribute used in a search to match a user, it could be uid, cn, email, sAMAccountName or other attributes depending on your LDAP/AD
ldap_uid = uid

#the scope to search for users, 0-LDAP_SCOPE_BASE, 1-LDAP_SCOPE_ONELEVEL, 2-LDAP_SCOPE_SUBTREE
ldap_scope = 2

#Timeout (in seconds)  when connecting to an LDAP Server. The default value (and most reasonable) is 5 seconds.
ldap_timeout = 5

#Verify certificate from LDAP server
ldap_verify_cert = true

#The base dn from which to lookup a group in LDAP/AD
ldap_group_basedn = cn=groups,cn=accounts,dc=example,dc=com

#filter to search LDAP/AD group
ldap_group_filter = objectclass=group

#The attribute used to name a LDAP/AD group, it could be cn, name
ldap_group_gid = cn

#The scope to search for ldap groups. 0-LDAP_SCOPE_BASE, 1-LDAP_SCOPE_ONELEVEL, 2-LDAP_SCOPE_SUBTREE
ldap_group_scope = 2

#Turn on or off the self-registration feature
self_registration = off

#The expiration time (in minute) of token created by token service, default is 30 minutes
token_expiration = 30

#The flag to control what users have permission to create projects
#The default value "everyone" allows everyone to creates a project.
#Set to "adminonly" so that only admin user can create project.
project_creation_restriction = everyone

#************************END INITIAL PROPERTIES************************

#######Harbor DB configuration section#######

#The address of the Harbor database. Only need to change when using external db.
db_host = mysql

#The password for the root user of Harbor DB. Change this before any production use.
db_password = xxxxxxxxxxxxxxxxxxxxxxxxxx

#The port of Harbor database host
db_port = 3306

#The user name of Harbor database
db_user = root

##### End of Harbor DB configuration#######

#The redis server address. Only needed in HA installation.
#address:port[,weight,password,db_index]
redis_url = redis:6379

##########Clair DB configuration############

#Clair DB host address. Only change it when using an exteral DB.
clair_db_host = postgres

#The password of the Clair's postgres database. Only effective when Harbor is deployed with Clair.
#Please update it before deployment. Subsequent update will cause Clair's API server and Harbor unable to access Clair's database.
clair_db_password = xxxxxxxxxxxxxxxxxxxxxxxxxxxx

#Clair DB connect port
clair_db_port = 5432

#Clair DB username
clair_db_username = postgres

#Clair default database
clair_db = postgres

##########End of Clair DB configuration############

#The following attributes only need to be set when auth mode is uaa_auth
uaa_endpoint = uaa.mydomain.org
uaa_clientid = id
uaa_clientsecret = secret
uaa_verify_cert = true
uaa_ca_cert = /path/to/ca.pem

### Docker Registry setting ###
#registry_storage_provider can be: filesystem, s3, gcs, azure, etc.
registry_storage_provider_name = filesystem
#registry_storage_provider_config is a comma separated "key: value" pairs, e.g. "key1: value, key2: value2".
#Refer to https://docs.docker.com/registry/configuration/#storage for all available configuration.
registry_storage_provider_config =

3)Log files, you can get them by package the /var/log/harbor/ .

harbor-ui:

[mysql] 2018/07/12 22:10:33 packets.go:33: unexpected EOF
[mysql] 2018/07/12 22:10:33 packets.go:130: write tcp 172.18.0.6:52436->172.18.0.2:3306: write: broken pipe
[mysql] 2018/07/12 22:10:33 packets.go:33: unexpected EOF
[mysql] 2018/07/12 22:10:33 packets.go:130: write tcp 172.18.0.6:51864->172.18.0.2:3306: write: broken pipe

harbor-db:

MySQL init process done. Ready for start up.

/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-updatedb.d/upgrade.sh
DB was created in Maria DB, skip upgrade.

2018-07-12 21:05:14 140167878498240 [Note] mysqld (mysqld 10.2.14-MariaDB) starting as process 20 ...
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Uses event mutexes
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Number of pools: 1
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Using SSE2 crc32 instructions
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Completed initialization of buffer pool
2018-07-12 21:05:14 140166886827776 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Highest supported file format is Barracuda.
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: Waiting for purge to start
2018-07-12 21:05:14 140167878498240 [Note] InnoDB: 5.7.21 started; log sequence number 1819386
2018-07-12 21:05:14 140167878498240 [Note] Plugin 'FEEDBACK' is disabled.
2018-07-12 21:05:14 140166039590656 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-07-12 21:05:14 140167878498240 [Note] Server socket created on IP: '::'.
2018-07-12 21:05:14 140167878498240 [Note] Reading of all Master_info entries succeded
2018-07-12 21:05:14 140167878498240 [Note] Added new Master_info '' to hash table
2018-07-12 21:05:14 140167878498240 [Note] mysqld: ready for connections.
Version: '10.2.14-MariaDB'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Source distribution
2018-07-12 21:05:14 140166039590656 [Note] InnoDB: Buffer pool(s) load completed at 180712 21:05:14
2018-07-12 21:15:21 140167600858880 [Warning] Aborted connection 15 to db: 'registry' user: 'root' host: 'harbor-jobservice.harbor_harbor' (Got timeout reading communication packets)
2018-07-12 21:31:12 140167601768192 [Warning] Aborted connection 61 to db: 'registry' user: 'root' host: 'harbor-ui.harbor_harbor' (Got timeout reading communication packets)
2018-07-12 21:31:12 140167600252672 [Warning] Aborted connection 67 to db: 'registry' user: 'root' host: 'harbor-ui.harbor_harbor' (Got timeout reading communication packets)
2018-07-12 22:20:50 140167600858880 [Warning] Aborted connection 184 to db: 'registry' user: 'root' host: 'harbor-ui.harbor_harbor' (Got timeout reading communication packets)
2018-07-12 22:20:50 140167600252672 [Warning] Aborted connection 183 to db: 'registry' user: 'root' host: 'harbor-ui.harbor_harbor' (Got timeout reading communication packets)

maria-db:

2018-07-12 22:30:46 140452641753024 [Note] mysqld (mysqld 10.2.14-MariaDB) starting as process 20 ...
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Uses event mutexes
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Number of pools: 1
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Using SSE2 crc32 instructions
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Completed initialization of buffer pool
2018-07-12 22:30:46 140451654915840 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Highest supported file format is Barracuda.
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-07-12 22:30:46 140452641753024 [Note] InnoDB: 5.7.21 started; log sequence number 1768311
2018-07-12 22:30:46 140450824431360 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-07-12 22:30:46 140452641753024 [Note] Plugin 'FEEDBACK' is disabled.
2018-07-12 22:30:46 140452641753024 [Note] Recovering after a crash using tc.log
2018-07-12 22:30:46 140452641753024 [ERROR] Bad magic header in tc log
2018-07-12 22:30:46 140452641753024 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete tc log and start mysqld with --tc-heuristic-recover={commit|rollback}
2018-07-12 22:30:46 140452641753024 [ERROR] Can't init tc log
2018-07-12 22:30:46 140452641753024 [ERROR] Aborting

CentOS Linux release 7.5.1804 (Core) 3.10.0-862.6.3.el7.x86_64 FIPS Enabled

After 15-20 hours the UI stops accepting logins (button greyed out) and the registry services stop working (push/pull).

Docker-compose reports all 'OK':

       Name                     Command               State                                  Ports
---------------------------------------------------------------------------------------------------------------------------------
harbor-adminserver   /harbor/start.sh                 Up
harbor-db            /usr/local/bin/docker-entr ...   Up      3306/tcp
harbor-jobservice    /harbor/start.sh                 Up
harbor-ui            /harbor/start.sh                 Up
nginx                nginx -g daemon off;             Up      0.0.0.0:5140->443/tcp, 0.0.0.0:4443->4443/tcp, 0.0.0.0:8080->80/tcp
redis                docker-entrypoint.sh redis ...   Up      6379/tcp
registry             /entrypoint.sh serve /etc/ ...   Up      5000/tcp

Stop and starting doesn't fix it. Only a complete wipe and start from scratch works. I had to remove the logging driver because after initial deployment it won't bind to the port anymore.

Any pointers?

ToroNZ commented 6 years ago

I've started it all up again using "GODEBUG=netdns=cgo" as I thought it could have been DNS related but still had no luck. The thing just fell down in a couple of hours with no apparent reason.

harbor-ui logs (full of these errors):

2018-07-16T18:40:38Z [ERROR] [config.go:467]: Failed to get configuration, will return empty string as admiral's endpoint, error: http error: code 500, message Internal Server Error
2018-07-16T18:40:38Z [ERROR] [config.go:525]: Failed to get configuration, will return false as read only, error: http error: code 500, message Internal Server Error

Current docker-compose.yml looks like:

version: '2'
services:
  registry:
    image: vmware/registry-photon:v2.6.2-v1.5.1
    container_name: registry
    restart: always
    volumes:
      - /harbor:/storage:z
      - ./common/config/registry/:/etc/registry/:z
    networks:
      - harbor
    environment:
      - GODEBUG=netdns=cgo
    command:
      ["serve", "/etc/registry/config.yml"]
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
  mysql:
    image: vmware/harbor-db:v1.5.1
    container_name: harbor-db
    restart: always
    volumes:
      - /data/database:/var/lib/mysql:z
    networks:
      - harbor
    environment:
      - GODEBUG=netdns=cgo
    env_file:
      - ./common/config/db/env
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
  adminserver:
    image: vmware/harbor-adminserver:v1.5.1
    container_name: harbor-adminserver
    env_file:
      - ./common/config/adminserver/env
    restart: always
    volumes:
      - /data/config/:/etc/adminserver/config/:z
      - /data/secretkey:/etc/adminserver/key:z
      - /data/:/data/:z
    networks:
      - harbor
    environment:
      - GODEBUG=netdns=cgo
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
  ui:
    image: vmware/harbor-ui:v1.5.1
    container_name: harbor-ui
    env_file:
      - ./common/config/ui/env
    restart: always
    volumes:
      - ./common/config/ui/app.conf:/etc/ui/app.conf:z
      - ./common/config/ui/private_key.pem:/etc/ui/private_key.pem:z
      - ./common/config/ui/certificates/:/etc/ui/certificates/:z
      - /data/secretkey:/etc/ui/key:z
      - /data/ca_download/:/etc/ui/ca/:z
      - /data/psc/:/etc/ui/token/:z
    networks:
      - harbor
    environment:
      - GODEBUG=netdns=cgo
    depends_on:
      - adminserver
      - registry
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
  jobservice:
    image: vmware/harbor-jobservice:v1.5.1
    container_name: harbor-jobservice
    env_file:
      - ./common/config/jobservice/env
    restart: always
    volumes:
      - /data/job_logs:/var/log/jobs:z
      - ./common/config/jobservice/config.yml:/etc/jobservice/config.yml:z
    networks:
      - harbor
    depends_on:
      - redis
      - ui
      - adminserver
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
  redis:
    image: vmware/redis-photon:v1.5.1
    container_name: redis
    restart: always
    volumes:
      - /data/redis:/data
    networks:
      - harbor
    environment:
      - GODEBUG=netdns=cgo
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
  proxy:
    image: vmware/nginx-photon:v1.5.1
    container_name: nginx
    restart: always
    volumes:
      - ./common/config/nginx:/etc/nginx:z
    networks:
      - harbor
    environment:
      - GODEBUG=netdns=cgo
    ports:
      - 8080:80
      - 5140:443
      - 4443:4443
    depends_on:
      - mysql
      - registry
      - ui
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
networks:
  harbor:
    external: false

I might give it another look to see if I can find the root cause, I really need the registry to survive more than 16 hours. Not fun having to rebuild it every morning.

ToroNZ commented 6 years ago

clair-photon throwing permissions errors:

[dumb-init] /clair2.0.1/clair: Permission denied

But directory is setup correctly?

# ls -laths /root/harbor/common/config/clair
total 12K
   0 drwxr-xr-x.  3 10000 10000  87 Jul 17 07:05 .
4.0K -rw-r--r--.  1 10000 10000 133 Jul 17 07:05 clair_env
4.0K -rw-r--r--.  1 10000 10000 636 Jul 17 07:05 config.yaml
4.0K -rw-r--r--.  1 10000 10000  39 Jul 17 07:05 postgres_env
   0 drwxr-xr-x. 11 root  root  130 Jul 10 18:42 ..
   0 drwxr-xr-x.  2 10000 10000  23 May 31 18:45 postgresql-init.d

ToroNZ commented 6 years ago

One of the crashed Harbor instances has this on the vmware/harbor-adminserver log:

2018-07-16T19:33:48Z [ERROR] [cfg.go:30]: failed to get system configurations: open /etc/adminserver/key: permission denied
172.18.0.1 - - [16/Jul/2018:19:33:48 +0000] "GET /api/configurations HTTP/1.1" 500 22

But if you jump into it, you can read they key just fine?

root [ /harbor ]# cat /etc/adminserver/key
KALIKn13M9pcuqczF
root [ /harbor ]#

reasonerjt commented 6 years ago

@ToroNZ Due to security reason, the process of adminserver is not started by root: https://github.com/vmware/harbor/blob/master/make/photon/adminserver/start.sh#L7 Are you using a released version?

ToroNZ commented 6 years ago

@reasonerjt I'm using the online installer (https://storage.googleapis.com/harbor-releases/harbor-online-installer-v1.5.1.tgz)

So, after allowing harbor to read the key (maybe this prep is done by prepare.sh.. I might have missed this)

chmod 754 /etc/adminserver/key

I'm now able to login... but keep getting a error banner every single click I make:

harbor-ui logs:

2018/07/17 09:15:28 [D] [server.go:2619] |    10.66.16.26| 200 |  17.790695ms|   match| POST     /login   r:/login
2018/07/17 09:15:28 [D] [server.go:2619] |    10.66.16.26| 200 |   4.882537ms|   match| GET      /api/users/current   r:/api/users/:id
2018-07-17T09:15:28Z [ERROR] [base.go:99]: failed to get public projects: Error 1146: Table 'registry.project_metadata' doesn't exist
2018/07/17 09:15:28 [D] [server.go:2619] |    10.66.16.26| 500 |   2.136692ms|   match| GET      /api/statistics   r:/api/statistics
2018/07/17 09:15:28 [D] [server.go:2619] |    10.66.16.26| 200 |   3.855474ms|   match| GET      /api/systeminfo/volumes   r:/api/systeminfo/volumes
2018-07-17T09:15:28Z [ERROR] [base.go:99]: failed to list projects: Error 1146: Table 'registry.project' doesn't exist
2018/07/17 09:15:28 [D] [server.go:2619] |    10.66.16.26| 500 |   2.029504ms|   match| GET      /api/projects   r:/api/projects/
2018-07-17T09:15:34Z [ERROR] [target.go:153]: failed to filter targets : Error 1146: Table 'registry.replication_target' doesn't exist
2018/07/17 09:15:39 [D] [server.go:2619] |    10.66.16.26| 200 |   2.473769ms|   match| GET      /api/systeminfo/volumes   r:/api/systeminfo/volumes
2018-07-17T09:15:39Z [ERROR] [base.go:99]: failed to get public projects: Error 1146: Table 'registry.project_metadata' doesn't exist
2018/07/17 09:15:39 [D] [server.go:2619] |    10.66.16.26| 500 |   1.120981ms|   match| GET      /api/statistics   r:/api/statistics
2018-07-17T09:15:39Z [ERROR] [base.go:99]: failed to list projects: Error 1146: Table 'registry.project' doesn't exist
2018/07/17 09:15:39 [D] [server.go:2619] |    10.66.16.26| 500 |   2.013168ms|   match| GET      /api/projects   r:/api/projects/
2018/07/17 09:15:52 [D] [server.go:2619] |      127.0.0.1| 200 |   2.825819ms|   match| GET      /api/ping   r:/api/ping
2018/07/17 09:16:22 [D] [server.go:2619] |      127.0.0.1| 200 |   3.207084ms|   match| GET      /api/ping   r:/api/ping
2018-07-17T09:16:46Z [ERROR] [target.go:153]: failed to filter targets : Error 1146: Table 'registry.replication_target' doesn't exist

Looks like the DB is buggered...

ToroNZ commented 6 years ago

To add to the issues... Clair log is full of these entries and the container enters a restart loop:

[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied

ToroNZ commented 6 years ago

So, after deploying EVERYTHING again for the fifth time... making sure I didn't miss anything.

When I lose the ability to login, I see this on the adminserver log:

2018-07-17T18:49:49Z [ERROR] [cfg.go:30]: failed to get system configurations: open /etc/adminserver/key: permission denied
172.18.0.6 - - [17/Jul/2018:18:49:49 +0000] "GET /api/configurations HTTP/1.1" 500 22

If I give it 754 permissions like mentioned before, it allows me to login. BUT, all projects are empty.

The log from harbor-ui shows:

2018-07-17T18:58:08Z [ERROR] [base.go:88]: failed to get repository: Get http://registry:5000/v2/openshift3/container-engine/tags/list: unable to read key file /etc/ui/private_key.pem: open /etc/ui/private_key.pem: permission denied
2018/07/17 18:58:08 [D] [server.go:2619] |    10.66.16.26| 500 |  11.719372ms|   match| GET      /api/repositories   r:/api/repositories

??? permissions and Harbor, why things lose permissions half-way ?

root [ /harbor ]# ls -laths /etc/ui/private_key.pem
4.0K -rw------- 1 root root 3.2K Jul 16 19:05 /etc/ui/private_key.pem

chmod 754 to that key fixes the problem.

Now I can access the main Harbor instance, the one that replicates to the others. I can browse the projects there. When it comes to the other 2 Harbor instances, I cannot browse the projects there. I also get a red banner like previously when clicking on 'Registries' or 'Replications'.

harbor-ui logs shows:

2018-07-17T19:14:02Z [WARNING] Failed to get pmid from path, error strconv.ParseInt: parsing "": invalid syntax
2018-07-17T19:14:02Z [ERROR] [base.go:88]: Failed to query database for member list, error: Error 1146: Table 'registry.user_group' doesn't exist
2018/07/17 19:14:02 [D] [server.go:2619] |    10.66.16.26| 500 |   6.666175ms|   match| GET      /api/projects/2/members   r:/api/projects/:pid([0-9]+)/members/?:pmid([0-9]+)
2018/07/17 19:14:02 [D] [server.go:2619] |    10.66.16.26| 200 |    5.54643ms|   match| GET      /api/projects/2   r:/api/projects/:id([0-9]+)
2018-07-17T19:14:03Z [WARNING] Failed to get pmid from path, error strconv.ParseInt: parsing "": invalid syntax
2018-07-17T19:14:03Z [ERROR] [base.go:88]: Failed to query database for member list, error: Error 1146: Table 'registry.user_group' doesn't exist
2018/07/17 19:14:03 [D] [server.go:2619] |    10.66.16.26| 500 |   8.295969ms|   match| GET      /api/projects/2/members   r:/api/projects/:pid([0-9]+)/members/?:pmid([0-9]+)
2018/07/17 19:14:03 [D] [server.go:2619] |    10.66.16.26| 200 |   7.602419ms|   match| GET      /api/projects/2   r:/api/projects/:id([0-9]+)
2018-07-17T19:14:04Z [ERROR] [replication_policy.go:98]: failed to get policies: Error 1146: Table 'registry.replication_policy' doesn't exist, query parameters: {1 500 0 }
2018-07-17T19:14:04Z [ERROR] [target.go:153]: failed to filter targets : Error 1146: Table 'registry.replication_target' doesn't exist

^^ This is happening on both of those instances.

reasonerjt commented 6 years ago

@ToroNZ Did you try the offline installer? I had a chance to deploy offline installer on an ubuntu box yesterday and everything works.

And I'll verify the online installer in a couple of days.

ToroNZ commented 6 years ago

Hang on, it was the offline installer:

864933610 Jul 11 01:28 harbor-offline-installer-v1.5.1.tgz

reasonerjt commented 6 years ago

Let me confirm, did you provision a clean docker host, download the offline installer, edit harbor.cfg and run install.sh see successful message and unable to use it? Based on your previous comments, seems you are installing Harbor repeatedly on one host and misconfiguration or old data caused different failures.

ToroNZ commented 6 years ago

Well, I did exactly what the instructions mention:

New Docker install on Centos7 server as mentioned before.
Downloaded offline installer.
Configured harbor.cfg.
Ran the install script.

After that, things worked for a couple of hours, then I was unable to login >> tried restarting but I had logging driver errors, then the rest as described above...

Every time I had to re-provision everything, did a wipe of the registry mount point and anything under /data. 3 different deployments in 3 CentOS servers, same behaviour, same errors.

ToroNZ commented 6 years ago

So that single instance that was still running didn't like the docker daemon upgrade from 18.03 to 18.06.

Now the DB won't come up:

/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-updatedb.d/upgrade.sh
DB was created in Maria DB, skip upgrade.

2018-08-04  7:47:15 140363830495168 [Note] mysqld (mysqld 10.2.14-MariaDB) starting as process 21 ...
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Uses event mutexes
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Number of pools: 1
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Using SSE2 crc32 instructions
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Completed initialization of buffer pool
2018-08-04  7:47:15 140362909071104 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-08-04  7:47:15 140363830495168 [Note] InnoDB: Highest supported file format is Barracuda.
2018-08-04  7:47:16 140363830495168 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-08-04  7:47:16 140363830495168 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-08-04  7:47:16 140363830495168 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-08-04  7:47:16 140363830495168 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-08-04  7:47:16 140363830495168 [Note] InnoDB: 5.7.21 started; log sequence number 6249933
2018-08-04  7:47:16 140361997473536 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-08-04  7:47:16 140363830495168 [Note] Plugin 'FEEDBACK' is disabled.
2018-08-04  7:47:16 140363830495168 [Note] Recovering after a crash using tc.log
2018-08-04  7:47:16 140363830495168 [ERROR] Bad magic header in tc log
2018-08-04  7:47:16 140363830495168 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete tc log and start mysqld with --tc-heuristic-recover={commit|rollback}
2018-08-04  7:47:16 140363830495168 [ERROR] Can't init tc log
2018-08-04  7:47:16 140363830495168 [ERROR] Aborting

ToroNZ commented 6 years ago

I've disabled SElinux for now and got a feeling this could also be related to umask settings. This environment uses umask 022 for privileged (root) and 027 for non-privileged (harbor) as default.

Could I suggest to change the default location for "/data" ? Perhaps something like /var/lib/harbor ?

I'll try put some more time into this later this week, I really want this to work properly without having to modified security settings.

ghost commented 6 years ago

Hi @ToroNZ – whew! Thanks for sticking with this. 🙈 Did disable SELinux have any effect?

I actually like the idea of putting things in /var/lib/harbor. I'll keep this open to track that request (though PRs are welcome!).

Please let us know how things are going with SELinux disabled and umask changes. If you're still stuck then maybe we can chat real-time on Slack to figure it out together.

ToroNZ commented 6 years ago

Hi @clouderati, our Harbor instances (3) have been stable for the last month. Using a combination of:

SELinux in permissive mode.
Cronjob setting 777 for '/registry' (registry data volume)
Cronjob setting 777 for '/data' (configuration files for Harbor and more)
Cronjob setting 644 for '/root/harbor/common/config/ui/private_key.pem' (ui cert key)

I am not proud of any of this lol, but it all happened during a very busy period where I just didn't have much time to properly troubleshoot it.

Things I could have done (and plan to do soon) are:

Use audit2allow to check issues related to '/registry' or '/data'.
Use 'chcon -R -t svirt_sandbox_file_t' on '/data' or '/registry' (although this should be taken care of by the "Z" docker volume flag). Step 1 will tell.
Change scripts to generate '/data' into a more appropriate location as discussed before.
Triple check what UID/GUID are used on each of the Harbor docker-compose containers.

I really like what Harbor does in Enterprise environments... RBAC, public/private repos, LDAP, Replication, etc.... It fits requirements really well.

Let me do some forking, I'll fiddle with it a bit more on a boring hardened RHEL7 with FIPS-2 and hopefully I can have a PR soon.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

goharbor / harbor

Unable to login - UI<->DB connectivity issues #5303