oss-db and elasticsearch containers don't stop automatically

davidediruscio commented 4 years ago

When stopping the docker-compose the containers scava-deployment_oss-db_1 and scava-deployment_elasticsearch_1 don't stop automatically and they throw an error. These containers must be manually stopped

Stopping scava-deployment_admin-webapp_1   ... done
Stopping scava-deployment_api-server_1     ... done
Stopping scava-deployment_dashb-importer_1 ... done
Stopping scava-deployment_kb-service_1     ... done
Stopping scava-deployment_prosoul_1        ... done
Stopping scava-deployment_auth-server_1    ... done
Stopping scava-deployment_oss-app_1        ... 
Stopping scava-deployment_kibiter_1        ... done
Stopping scava-deployment_oss-db_1         ... error
Stopping scava-deployment_elasticsearch_1  ... error
Stopping scava-deployment_kb-db_1          ... done

ERROR: for scava-deployment_oss-app_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=70)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Any idea @MarcioMateus @md2manoppello @creat89 ?

EDITED: I'm using https://github.com/crossminer/scava-deployment/commit/a678687ce4b0562bf8851e8da1c68595bde7eb4e to execute the docker-compose

davidediruscio commented 4 years ago

I don't think I have seen that error before, but I'll try to check later on our server. @Danny2097, have you seen this error before?

davidediruscio commented 4 years ago

@creat89 / @valeriocos no I have not seen this issue before. The only issue I saw was with the initial start-up of the ES container.

@valeriocos what analysis tasks did you have running prior to this (if any)? Perhaps there is a rogue process preventing it closing?

davidediruscio commented 4 years ago

None, I did a docker system prune -a and checked that no containers were running after.

This issue may be related to another one I'm going to open soon. When doing docker-compose up, the platform gets blocked on some kb-db actions (after prosoul actions). Nothing (workers, projects, metrics, etc.) is visible on the cockpit UI except for http://localhost:5601/#/home (plese see https://github.com/crossminer/scava-deployment/issues/85#issuecomment-520717806). Have you seen something similar?

davidediruscio commented 4 years ago

any news on this issue @creat89 , did you have time to check it on your server?

davidediruscio commented 4 years ago

On my side, running dev commit 35e7d18ac15755752cd3b2414e97b5767523e501 which pulls the image for https://github.com/crossminer/scava/commit/fd85adcb776102d37492c04d0ee55c92b884c54a from Scava:

$ docker system prune -a --volumes
$ docker-compose -f docker-compose-build.yml build --no-cache --parallel
$ docker-compose -f docker-compose-build.yml up

Containers stop just fine:

Stopping scava-deployment_admin-webapp_1   ... done
Stopping scava-deployment_dashb-importer_1 ... done
Stopping scava-deployment_api-server_1     ... done
Stopping scava-deployment_kb-service_1     ... done
Stopping scava-deployment_prosoul_1        ... done
Stopping scava-deployment_auth-server_1    ... done
Stopping scava-deployment_kibiter_1        ... done
Stopping scava-deployment_oss-db_1         ... done
Stopping scava-deployment_elasticsearch_1  ... done
Stopping scava-deployment_kb-db_1          ... done

davidediruscio commented 4 years ago

@tdegueul , I'm replicating your steps (the only differences are the option --volumes and --parallel)

davidediruscio commented 4 years ago

closing the issue, after following these steps I didn't get any error, thanks @tdegueul

davidediruscio commented 4 years ago

The issue happened again after following the steps at: https://github.com/crossminer/scava-deployment/issues/89#issuecomment-522159463

^CGracefully stopping... (press Ctrl+C again to force)
Stopping scava-deployment_admin-webapp_1  ... done
Stopping scava-deployment_kb-service_1    ... done
Stopping scava-deployment_api-server_1    ... done
Stopping scava-deployment_oss-app_1       ... 
Stopping scava-deployment_kibiter_1       ... done
Stopping scava-deployment_auth-server_1   ... done
Stopping scava-deployment_oss-db_1        ... error
Stopping scava-deployment_elasticsearch_1 ... error
Stopping scava-deployment_kb-db_1         ... done

ERROR: for scava-deployment_oss-app_1  UnixHTTPConnectionPool(host='localhost', port=None): Read timed out. (read timeout=70)
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

davidediruscio commented 4 years ago

I saw similar behaviour when using Docker engine for Mac.

I usually "solve" the problem with a restart of the docker engine. When it doesn't solve the problem, then it is time to run docker system prune -a --volumes.

After a quick search, it seems to be related with the "stress" that the containers do on the docker daemon. The solutions proposed are:

Reduce number of docker containers and/or docker images sizes (may be difficult, but we can try to do some optimisations to the images)
Increase the resources available to the docker engine (nr cpus, RAM, storage) (not always possible)
Change the value of the environment variables that define the timeouts:
```
export DOCKER_CLIENT_TIMEOUT=120
export COMPOSE_HTTP_TIMEOUT=120
```

eclipse-researchlabs / scava-deployment

oss-db and elasticsearch containers don't stop automatically #7