Closed tuxdna closed 7 years ago
As we don't want to maintain some sort of dependencies in the docker-compose or maintaining hard-coded delays, let's bring the local setup more to the deployment, where OpenShift transparently restarts services when they fail to bring up.
Just add restart policy to the docker-compose:
restart: always
Please open a PR for this.
Thanks!
@fridex It seems to me that the container isn't dying but the API inside the running container is retrying few times:
$ ./docker-compose.sh ps
+ docker-compose -f docker-compose.yml -f docker-compose.devel.yml ps
Name Command State Ports
----------------------------------------------------------------------------------------------------------------------------------------------
anitya-server /bin/sh -c /src/runinconta ... Up 0.0.0.0:31005->5000/tcp
bayesian-gremlin-http /bin/entrypoint-local.sh Up 0.0.0.0:8181->8182/tcp
common_cvedb-s3-dump_1 /usr/local/bin/cvedb-s3-du ... Exit 0
common_data-model-importer_1 /bin/entrypoint.sh Up 0.0.0.0:9192->9192/tcp
common_worker-api_1 /usr/bin/workers.sh Up
common_worker-ingestion_1 /usr/bin/workers.sh Up
coreapi-broker /tmp/rabbitmq/run-rabbitmq ... Up 0.0.0.0:15672->15672/tcp, 25672/tcp, 4369/tcp, 0.0.0.0:5672->5672/tcp
coreapi-jobs /usr/bin/run_jobs.sh Up 0.0.0.0:34000->34000/tcp
coreapi-pgbouncer /bin/sh -c /usr/bin/run-pg ... Up 0.0.0.0:5432->5432/tcp
coreapi-postgres container-entrypoint run-p ... Up 0.0.0.0:6432->5432/tcp
coreapi-s3 /usr/bin/docker-entrypoint ... Up 0.0.0.0:33000->33000/tcp, 9000/tcp
coreapi-server /usr/bin/coreapi-server.sh Up 0.0.0.0:32000->5000/tcp
coreapi-worker-db-migrations /alembic/run-db-migrations.sh Exit 0
dynamodb entrypoint -sharedDb Up 0.0.0.0:4567->4567/tcp, 0.0.0.0:8000->8000/tcp
Do you think restart: always
will work for this?
@fridex It seems to me that the container isn't dying but the API inside the running container is retrying few times:
OK, so what is the issue here? If it successfully retries after some time, things should work as expected, right?
Do you think restart: always will work for this?
The restart policy is on the docker layer. If the container does not exit, it will not affect the current behaviour anyhow.
@fridex It is the gunicorn
[1] process that is restarting the data model importer api inside the container
From the logs [2] I can observe that data model importer attempts to connect to gremlin-http few times, after which there are no more errors.
[1] https://github.com/fabric8-analytics/fabric8-analytics-data-model/blob/master/scripts/entrypoint.sh#L4 [2] https://paste.fedoraproject.org/paste/2KmUJBY1CG4~7Qc0O~g8gF5M1UNdIGYhyRLivL9gydE=
And yes, the readiness probe also succeeds after waiting for a while
From the logs [2] I can observe that data model importer attempts to connect to gremlin-http few times, after which there are no more errors.
OK, so I would say we can close this as not-a-bug.
Yes, functionally this works. However it will be good to avoid the error that is generated for first few attempts.
From the data integrity perspective, what happens if an analysis is invoked and the data-model-importer API is not yet up ( due to the failure discussed in this issue ). I understand that all the workers will be run and data will be put into S3 ( minio ), after which the data is put into graph. But the this last step will fail because data-model-importer API is not up yet. Will the ingestion into data-mode-importer be attempted again form the same analysis after a while?
Yes, functionally this works. However it will be good to avoid the error that is generated for first few attempts.
You can see errors even for other services as well.
From the data integrity perspective, what happens if an analysis is invoked and the data-model-importer API is not yet up ( due to the failure discussed in this issue ). I understand that all the workers will be run and data will be put into S3 ( minio ), after which the data is put into graph. But the this last step will fail because data-model-importer API is not up yet. Will the ingestion into data-mode-importer be attempted again form the same analysis after a while?
No, the ingestion will not be re-scheduled.
Note that this is setup for local development. It is your responsibility to have the system up when you want to test something. Adding delays in the data-importer service will not solve your issue - analyses can still be run and the results won't be synced using data-importer anyway. Moreover, having the error message there will help you know when the whole system is up.
Anyway, there is a plan to remove data-importer service and rather do syncs using Selinon task after each analyses. There is no point to spend time on this.
In that case we can close this issue.
Start the services using Docker Compose as below:
Failure output:
The issue is that data-model-importer depends on gremlin-http to be up. By the time gremlin-http starts, the importer already had failed trying to connect.
We could use some sort of a delay mechanism to delay starting of data-model-importer until gremlin-http has already started.
Related thread and SO post: