Closed konne closed 2 years ago
It does happen occasionally, only need to re-execute the command to resolve it.
/schedule
@csweichel: Issue scheduled in the workspace team (WIP: 0)
I looked around if I can find something relevant and stumbled upon this issue. I found this comment relatable. The issue is closed and an issue exists here.
Can you share the docker compose file and the docker files used by you in this project? If you find your issue similar to mentioned in previous paragraph, let us know.
@princerachit Thanks for linking the issue. I already read them before I was filling out my issue:
We just run it with the following command and the two added files. Important they are anonymized. So can not be 1:1 tested from your side. You can just look through.
DOCKER_BUILDKIT=1 COMPOSE_DOCKER_CLI_BUILD=1 docker-compose -f docker-compose.yml -f docker-compose.build.yml up
compose.yml
version: '3.7'
x-core-external-services: &core-external-services
postgres:
container_name: postgres
image: postgres:13.4-alpine
ports:
- 5432:5432
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U postgres']
interval: 2s
timeout: 2s
retries: 5
volumes:
- product-postgres-data:/var/lib/postgresql/data
redis:
container_name: redis
image: redis
command: redis-server --save ''
ports:
- 6379:6379
tmpfs:
- /data
nats:
container_name: nats
image: nats:2.4.0
ports:
- 4222:4222
- 8222:8222
kes:
container_name: kes
image: minio/kes
healthcheck:
interval: 2s
timeout: 2s
retries: 5
volumes:
- ... KES MAPPING
ports:
- 7373:7373
x-service-depends-on-external-names: &default-service-depends-on-external-names
postgres:
condition: service_healthy
nats:
condition: service_started
redis:
condition: service_started
x-service-depends-on-external: &default-service-depends-on-external
depends_on:
<<: *default-service-depends-on-external-names
x-service: &default-service
<<: *default-service-depends-on-external
mem_limit: 1024m
mem_reservation: 128M
pull_policy: always
env_file:
- ./.product.local.env
- ./.env
x-service-api-name: &default-service-api-name 'api'
x-service-api: &default-service-api
<<: *default-service
container_name: *default-service-api-name
image: company/product-api
expose:
- 80
ports:
- 3333:80
- 30227:30227
x-service-auth-name: &default-service-auth-name 'auth'
x-service-auth: &default-service-auth
<<: *default-service
container_name: *default-service-auth-name
image: company/product-auth
ports:
- 30233:30233
x-service-object-name: &default-service-object-name 'object'
x-service-object: &default-service-object
<<: *default-service
container_name: *default-service-object-name
image: company/product-object
ports:
- 30234:30234
x-service-web-name: &default-service-web-name 'web'
x-service-web: &default-service-web
<<: *default-service
container_name: *default-service-web-name
image: company/product-web
x-service-auth-login-name: &default-service-auth-login-name 'auth-login'
x-service-auth-login: &default-service-auth-login
<<: *default-service
container_name: *default-service-auth-login-name
image: company/product-auth-login
x-service-da-engine-name: &default-service-engine-name 'engine'
x-service-da-engine: &default-service-engine
<<: *default-service
container_name: *default-service-da-engine-name
image: company/product-engine
volumes:
- ./tools/kes/certs/client.cert:/certs/client.cert
- ./tools/kes/certs/client.key:/certs/client.key
ports:
- 8333:8333
depends_on:
<<: *default-service-depends-on-external-names
kes:
condition: service_healthy
environment:
- ... ENV SETTINGS
services:
<<: *core-external-services
nginx:
container_name: nginx
image: nginx:latest
volumes:
- ./tools/nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
web:
condition: service_started
api:
condition: service_started
ports:
- ${product_NGINX_PORT:-80}:80
web:
<<: *default-service-web
api:
<<: *default-service-api
auth:
<<: *default-service-auth
object:
<<: *default-service-object
engine:
<<: *default-service-engine
networks:
default:
name: product-network
volumes:
product-volume:
name: product-volume
driver: local
driver_opts:
type: none
o: bind
device: '${PWD}'
product-postgres-data:
name: product-postgres-data
compose.build.yml
version: '3.7'
x-service-node-build: &default-service-node-build
image: 'node:14-alpine'
working_dir: /usr/src/app
volumes:
- product-volume:/usr/src/app
env_file:
- ./.product.local.env
- ./.env
services:
api:
<<: *default-service-node-build
command: npm run start api
auth:
<<: *default-service-node-build
command: npm run start auth
object:
<<: *default-service-node-build
command: npm run start object
.... just more in the same way
engine:
build:
context: .
dockerfile: ./apps/engine/Dockerfile
web:
<<: *default-service-node-build
command: npm start -- web --port 80 --host 0.0.0.0 --disableHostCheck
mem_limit: 4096m
environment:
- product_API_HOST=api
ports:
- 3380:80
expose:
- 80
@konne Thanks for sharing the files. Do you have the logs of the failure stored somewhere, Can you redact sensitive info and share rest of the log with us. If you don't have the logs and you see this issue again please take a dump of the log and share it with us.
@princerachit I don't have the log and I have unfortunately the workspace already deleted. What logs do you need? Where can I find these logs?
It happens at least once a week, so I only need this info and then I will share the logs.
@konne I have added more logs to debug this issue further from our side. Once this PR is merged and deployed we would have more visibility on what is happening.
I am closing this issue. Let me know if you see this again. We have appropriate logs to investigate.
@princerachit we have this issue nearly every day since we now expanded the userbase step by step. Please let it open and work on this topic. If you need I can also ask the team to add here every time this occurs.
Thanks @konne . We need the following information whenever you see the issue:
26.10.2021 9:30 (CET)
azure-alligator-ftal0uzb.ws-eu17
docker-compose up
happen again:
crimson-silverfish-87dr8961
happen again:
blush-vole-efcfmm4f.ws-eu17
we're having this happen on an increasing basis. If we do docker-compose down followed by docker-compose up it often solves the issue, so it can't be related to the containers or configuration
Hi @konne, I hope you're well! May I ask, if you're able to share the name of a workspace and time where you experienced this issue, that would be great. Also, if you're able to share a public repo where I can reliably recreate the problem that would be super. I did a brief attempts here to recreate the problem, but was not successful.
@kylos101 all the workspace ideas from my side I always added and the time was always like 2mins after it happen if you look to the comment. No, we have no public repo, but I can make with you a teamviewer, zoom, whatever meeting, and show it to you. We can not reliably reproduce it. During normal usage, it happens around 2-3 times a week per developer.
Hey @konne :wave: , I'm sorry, I should have explained why I asked for data again.
We setup a new tracing system in mid-December 2021 :bulb: . Old data was not migrated to the new tracing system, therefore I cannot search for older workspaces. :disappointed:
If the issue happens again, and you get a moment, please share the related workspace? :pray: I apologize, it is frustrating to share the same thing repetitively and not get a desirable outcome. However, I am certain the new data will be helpful. Let us know?
Hi @konne , I just sent you an email asking for more information. This is intentional, I do not wish for you to share a workspace snapshot URL in this issue. Let us know if that's possible?
@kylos101 sorry, I don't work day by day with gitpod so I rely a little bit on developer feedback. Here is the first entry:
purple-dodo-gqnfpk03 ; time: 11:15 CET 17.01.2022
We just rolled out what we hope is a fix for this issue (#7657). I'll close this issue for now. Please re-open/file a new one if this problem persists.
The major contributing factor was a timing/race issue in the libseccomp golang bindings we're using. This caused the mount
syscall interception via seccomp-notify to fail in highly concurrent/heavy load scenarios.
Bug description
We have sometimes the issue that if I starting multiple containers with docker-compose up that some of them are crashing with the following error. Even in a fresh new started workspace. But in most cases, it works just perfectly without any change.
If you have the chance to find more in your logs. It happens 30.Sept 8:17 CET in workspace: salmon-planarian-k3xf4ba4.ws-eu18
Steps to reproduce
unclear
Expected behavior
No response
Example repository
No response
Anything else?
No response