Closed ToroNZ closed 5 years ago
I've started it all up again using "GODEBUG=netdns=cgo" as I thought it could have been DNS related but still had no luck. The thing just fell down in a couple of hours with no apparent reason.
harbor-ui logs (full of these errors):
2018-07-16T18:40:38Z [ERROR] [config.go:467]: Failed to get configuration, will return empty string as admiral's endpoint, error: http error: code 500, message Internal Server Error
2018-07-16T18:40:38Z [ERROR] [config.go:525]: Failed to get configuration, will return false as read only, error: http error: code 500, message Internal Server Error
Current docker-compose.yml looks like:
version: '2'
services:
registry:
image: vmware/registry-photon:v2.6.2-v1.5.1
container_name: registry
restart: always
volumes:
- /harbor:/storage:z
- ./common/config/registry/:/etc/registry/:z
networks:
- harbor
environment:
- GODEBUG=netdns=cgo
command:
["serve", "/etc/registry/config.yml"]
logging:
driver: "json-file"
options:
max-size: "10m"
mysql:
image: vmware/harbor-db:v1.5.1
container_name: harbor-db
restart: always
volumes:
- /data/database:/var/lib/mysql:z
networks:
- harbor
environment:
- GODEBUG=netdns=cgo
env_file:
- ./common/config/db/env
logging:
driver: "json-file"
options:
max-size: "10m"
adminserver:
image: vmware/harbor-adminserver:v1.5.1
container_name: harbor-adminserver
env_file:
- ./common/config/adminserver/env
restart: always
volumes:
- /data/config/:/etc/adminserver/config/:z
- /data/secretkey:/etc/adminserver/key:z
- /data/:/data/:z
networks:
- harbor
environment:
- GODEBUG=netdns=cgo
logging:
driver: "json-file"
options:
max-size: "10m"
ui:
image: vmware/harbor-ui:v1.5.1
container_name: harbor-ui
env_file:
- ./common/config/ui/env
restart: always
volumes:
- ./common/config/ui/app.conf:/etc/ui/app.conf:z
- ./common/config/ui/private_key.pem:/etc/ui/private_key.pem:z
- ./common/config/ui/certificates/:/etc/ui/certificates/:z
- /data/secretkey:/etc/ui/key:z
- /data/ca_download/:/etc/ui/ca/:z
- /data/psc/:/etc/ui/token/:z
networks:
- harbor
environment:
- GODEBUG=netdns=cgo
depends_on:
- adminserver
- registry
logging:
driver: "json-file"
options:
max-size: "10m"
jobservice:
image: vmware/harbor-jobservice:v1.5.1
container_name: harbor-jobservice
env_file:
- ./common/config/jobservice/env
restart: always
volumes:
- /data/job_logs:/var/log/jobs:z
- ./common/config/jobservice/config.yml:/etc/jobservice/config.yml:z
networks:
- harbor
depends_on:
- redis
- ui
- adminserver
logging:
driver: "json-file"
options:
max-size: "10m"
redis:
image: vmware/redis-photon:v1.5.1
container_name: redis
restart: always
volumes:
- /data/redis:/data
networks:
- harbor
environment:
- GODEBUG=netdns=cgo
logging:
driver: "json-file"
options:
max-size: "10m"
proxy:
image: vmware/nginx-photon:v1.5.1
container_name: nginx
restart: always
volumes:
- ./common/config/nginx:/etc/nginx:z
networks:
- harbor
environment:
- GODEBUG=netdns=cgo
ports:
- 8080:80
- 5140:443
- 4443:4443
depends_on:
- mysql
- registry
- ui
logging:
driver: "json-file"
options:
max-size: "10m"
networks:
harbor:
external: false
I might give it another look to see if I can find the root cause, I really need the registry to survive more than 16 hours. Not fun having to rebuild it every morning.
clair-photon throwing permissions errors:
[dumb-init] /clair2.0.1/clair: Permission denied
But directory is setup correctly?
# ls -laths /root/harbor/common/config/clair
total 12K
0 drwxr-xr-x. 3 10000 10000 87 Jul 17 07:05 .
4.0K -rw-r--r--. 1 10000 10000 133 Jul 17 07:05 clair_env
4.0K -rw-r--r--. 1 10000 10000 636 Jul 17 07:05 config.yaml
4.0K -rw-r--r--. 1 10000 10000 39 Jul 17 07:05 postgres_env
0 drwxr-xr-x. 11 root root 130 Jul 10 18:42 ..
0 drwxr-xr-x. 2 10000 10000 23 May 31 18:45 postgresql-init.d
One of the crashed Harbor instances has this on the vmware/harbor-adminserver log:
2018-07-16T19:33:48Z [ERROR] [cfg.go:30]: failed to get system configurations: open /etc/adminserver/key: permission denied
172.18.0.1 - - [16/Jul/2018:19:33:48 +0000] "GET /api/configurations HTTP/1.1" 500 22
But if you jump into it, you can read they key just fine?
root [ /harbor ]# cat /etc/adminserver/key
KALIKn13M9pcuqczF
root [ /harbor ]#
@ToroNZ Due to security reason, the process of adminserver is not started by root: https://github.com/vmware/harbor/blob/master/make/photon/adminserver/start.sh#L7 Are you using a released version?
@reasonerjt I'm using the online installer (https://storage.googleapis.com/harbor-releases/harbor-online-installer-v1.5.1.tgz)
So, after allowing harbor to read the key (maybe this prep is done by prepare.sh.. I might have missed this)
chmod 754 /etc/adminserver/key
I'm now able to login... but keep getting a error banner every single click I make:
harbor-ui logs:
2018/07/17 09:15:28 [D] [server.go:2619] | 10.66.16.26| 200 | 17.790695ms| match| POST /login r:/login
2018/07/17 09:15:28 [D] [server.go:2619] | 10.66.16.26| 200 | 4.882537ms| match| GET /api/users/current r:/api/users/:id
2018-07-17T09:15:28Z [ERROR] [base.go:99]: failed to get public projects: Error 1146: Table 'registry.project_metadata' doesn't exist
2018/07/17 09:15:28 [D] [server.go:2619] | 10.66.16.26| 500 | 2.136692ms| match| GET /api/statistics r:/api/statistics
2018/07/17 09:15:28 [D] [server.go:2619] | 10.66.16.26| 200 | 3.855474ms| match| GET /api/systeminfo/volumes r:/api/systeminfo/volumes
2018-07-17T09:15:28Z [ERROR] [base.go:99]: failed to list projects: Error 1146: Table 'registry.project' doesn't exist
2018/07/17 09:15:28 [D] [server.go:2619] | 10.66.16.26| 500 | 2.029504ms| match| GET /api/projects r:/api/projects/
2018-07-17T09:15:34Z [ERROR] [target.go:153]: failed to filter targets : Error 1146: Table 'registry.replication_target' doesn't exist
2018/07/17 09:15:39 [D] [server.go:2619] | 10.66.16.26| 200 | 2.473769ms| match| GET /api/systeminfo/volumes r:/api/systeminfo/volumes
2018-07-17T09:15:39Z [ERROR] [base.go:99]: failed to get public projects: Error 1146: Table 'registry.project_metadata' doesn't exist
2018/07/17 09:15:39 [D] [server.go:2619] | 10.66.16.26| 500 | 1.120981ms| match| GET /api/statistics r:/api/statistics
2018-07-17T09:15:39Z [ERROR] [base.go:99]: failed to list projects: Error 1146: Table 'registry.project' doesn't exist
2018/07/17 09:15:39 [D] [server.go:2619] | 10.66.16.26| 500 | 2.013168ms| match| GET /api/projects r:/api/projects/
2018/07/17 09:15:52 [D] [server.go:2619] | 127.0.0.1| 200 | 2.825819ms| match| GET /api/ping r:/api/ping
2018/07/17 09:16:22 [D] [server.go:2619] | 127.0.0.1| 200 | 3.207084ms| match| GET /api/ping r:/api/ping
2018-07-17T09:16:46Z [ERROR] [target.go:153]: failed to filter targets : Error 1146: Table 'registry.replication_target' doesn't exist
Looks like the DB is buggered...
To add to the issues... Clair log is full of these entries and the container enters a restart loop:
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
[dumb-init] /clair2.0.1/clair: Permission denied
So, after deploying EVERYTHING again for the fifth time... making sure I didn't miss anything.
When I lose the ability to login, I see this on the adminserver log:
2018-07-17T18:49:49Z [ERROR] [cfg.go:30]: failed to get system configurations: open /etc/adminserver/key: permission denied
172.18.0.6 - - [17/Jul/2018:18:49:49 +0000] "GET /api/configurations HTTP/1.1" 500 22
If I give it 754 permissions like mentioned before, it allows me to login. BUT, all projects are empty.
The log from harbor-ui shows:
2018-07-17T18:58:08Z [ERROR] [base.go:88]: failed to get repository: Get http://registry:5000/v2/openshift3/container-engine/tags/list: unable to read key file /etc/ui/private_key.pem: open /etc/ui/private_key.pem: permission denied
2018/07/17 18:58:08 [D] [server.go:2619] | 10.66.16.26| 500 | 11.719372ms| match| GET /api/repositories r:/api/repositories
??? permissions and Harbor, why things lose permissions half-way ?
root [ /harbor ]# ls -laths /etc/ui/private_key.pem
4.0K -rw------- 1 root root 3.2K Jul 16 19:05 /etc/ui/private_key.pem
chmod 754 to that key fixes the problem.
Now I can access the main Harbor instance, the one that replicates to the others. I can browse the projects there. When it comes to the other 2 Harbor instances, I cannot browse the projects there. I also get a red banner like previously when clicking on 'Registries' or 'Replications'.
harbor-ui logs shows:
2018-07-17T19:14:02Z [WARNING] Failed to get pmid from path, error strconv.ParseInt: parsing "": invalid syntax
2018-07-17T19:14:02Z [ERROR] [base.go:88]: Failed to query database for member list, error: Error 1146: Table 'registry.user_group' doesn't exist
2018/07/17 19:14:02 [D] [server.go:2619] | 10.66.16.26| 500 | 6.666175ms| match| GET /api/projects/2/members r:/api/projects/:pid([0-9]+)/members/?:pmid([0-9]+)
2018/07/17 19:14:02 [D] [server.go:2619] | 10.66.16.26| 200 | 5.54643ms| match| GET /api/projects/2 r:/api/projects/:id([0-9]+)
2018-07-17T19:14:03Z [WARNING] Failed to get pmid from path, error strconv.ParseInt: parsing "": invalid syntax
2018-07-17T19:14:03Z [ERROR] [base.go:88]: Failed to query database for member list, error: Error 1146: Table 'registry.user_group' doesn't exist
2018/07/17 19:14:03 [D] [server.go:2619] | 10.66.16.26| 500 | 8.295969ms| match| GET /api/projects/2/members r:/api/projects/:pid([0-9]+)/members/?:pmid([0-9]+)
2018/07/17 19:14:03 [D] [server.go:2619] | 10.66.16.26| 200 | 7.602419ms| match| GET /api/projects/2 r:/api/projects/:id([0-9]+)
2018-07-17T19:14:04Z [ERROR] [replication_policy.go:98]: failed to get policies: Error 1146: Table 'registry.replication_policy' doesn't exist, query parameters: {1 500 0 }
2018-07-17T19:14:04Z [ERROR] [target.go:153]: failed to filter targets : Error 1146: Table 'registry.replication_target' doesn't exist
^^ This is happening on both of those instances.
@ToroNZ Did you try the offline installer? I had a chance to deploy offline installer on an ubuntu box yesterday and everything works.
And I'll verify the online installer in a couple of days.
Hang on, it was the offline installer:
864933610 Jul 11 01:28 harbor-offline-installer-v1.5.1.tgz
Let me confirm, did you provision a clean docker host, download the offline installer, edit harbor.cfg
and run install.sh
see successful message and unable to use it?
Based on your previous comments, seems you are installing Harbor repeatedly on one host and misconfiguration or old data caused different failures.
Well, I did exactly what the instructions mention:
After that, things worked for a couple of hours, then I was unable to login >> tried restarting but I had logging driver errors, then the rest as described above...
Every time I had to re-provision everything, did a wipe of the registry mount point and anything under /data. 3 different deployments in 3 CentOS servers, same behaviour, same errors.
So that single instance that was still running didn't like the docker daemon upgrade from 18.03 to 18.06.
Now the DB won't come up:
/usr/local/bin/docker-entrypoint.sh: running /docker-entrypoint-updatedb.d/upgrade.sh
DB was created in Maria DB, skip upgrade.
2018-08-04 7:47:15 140363830495168 [Note] mysqld (mysqld 10.2.14-MariaDB) starting as process 21 ...
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Uses event mutexes
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Compressed tables use zlib 1.2.8
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Number of pools: 1
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Using SSE2 crc32 instructions
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Completed initialization of buffer pool
2018-08-04 7:47:15 140362909071104 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-08-04 7:47:15 140363830495168 [Note] InnoDB: Highest supported file format is Barracuda.
2018-08-04 7:47:16 140363830495168 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-08-04 7:47:16 140363830495168 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-08-04 7:47:16 140363830495168 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-08-04 7:47:16 140363830495168 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-08-04 7:47:16 140363830495168 [Note] InnoDB: 5.7.21 started; log sequence number 6249933
2018-08-04 7:47:16 140361997473536 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2018-08-04 7:47:16 140363830495168 [Note] Plugin 'FEEDBACK' is disabled.
2018-08-04 7:47:16 140363830495168 [Note] Recovering after a crash using tc.log
2018-08-04 7:47:16 140363830495168 [ERROR] Bad magic header in tc log
2018-08-04 7:47:16 140363830495168 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete tc log and start mysqld with --tc-heuristic-recover={commit|rollback}
2018-08-04 7:47:16 140363830495168 [ERROR] Can't init tc log
2018-08-04 7:47:16 140363830495168 [ERROR] Aborting
I've disabled SElinux for now and got a feeling this could also be related to umask settings. This environment uses umask 022 for privileged (root) and 027 for non-privileged (harbor) as default.
Could I suggest to change the default location for "/data" ? Perhaps something like /var/lib/harbor ?
I'll try put some more time into this later this week, I really want this to work properly without having to modified security settings.
Hi @ToroNZ – whew! Thanks for sticking with this. 🙈 Did disable SELinux have any effect?
I actually like the idea of putting things in /var/lib/harbor
. I'll keep this open to track that request (though PRs are welcome!).
Please let us know how things are going with SELinux disabled and umask changes. If you're still stuck then maybe we can chat real-time on Slack to figure it out together.
Hi @clouderati, our Harbor instances (3) have been stable for the last month. Using a combination of:
I am not proud of any of this lol, but it all happened during a very busy period where I just didn't have much time to properly troubleshoot it.
Things I could have done (and plan to do soon) are:
I really like what Harbor does in Enterprise environments... RBAC, public/private repos, LDAP, Replication, etc.... It fits requirements really well.
Let me do some forking, I'll fiddle with it a bit more on a boring hardened RHEL7 with FIPS-2 and hopefully I can have a PR soon.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If you are reporting a problem, please make sure the following information are provided: 1)Version of docker engine and docker-compose.
2)Config files of harbor, you can get them by packaging "harbor.cfg" and files in the same directory, including subdirectory.
3)Log files, you can get them by package the /var/log/harbor/ .
harbor-ui:
harbor-db:
maria-db:
CentOS Linux release 7.5.1804 (Core) 3.10.0-862.6.3.el7.x86_64 FIPS Enabled
After 15-20 hours the UI stops accepting logins (button greyed out) and the registry services stop working (push/pull).
Docker-compose reports all 'OK':
Stop and starting doesn't fix it. Only a complete wipe and start from scratch works. I had to remove the logging driver because after initial deployment it won't bind to the port anymore.
Any pointers?