Open ozett opened 3 years ago
Hi, to be honest, I have no idea :( Hate CUDA for this. It may just don't work and you don't know why. This is what I know - we use CUDA 10.0. Probably because this was the version that worked - as you see sometimes it just does not work, so we took the version that works. This version is installed inside the container, so it doesn't matter what CUDA version is in your machine. What matters is Nvidia driver. My current driver is 470.74, but it worked on 460 as well. My only guess is that you have another application that takes the GPU so CompreFace doesn't have access to it. As far as I see, your screenshots with nvidia-smi were done inside the container. What about the host machine?
hi, thanks for looking into this..
nvidia-smi is running on the host. nvidia-smi inside a container does not give information about gpu-task. nvidia is a small company.
the whole nvidia-container thing is running fine from using it with the frigate-nvidia container on the same maschine.
ist must be something inside the compreface GPU-container:
CORE_VERSION=0.6.1-mobilenet-gpu or CORE_VERSION=0.6.1-arcface-r100-gpu
any hint how to track this down inside your container?
I mean here is the results of nvidia-smi
if I run it on the host (not inside the container)
As you can see, without CompreFace there are several applications that use GPU. I wanted to see which applications use your GPU in your host
any hint how to track this down inside your container? I don't see how is it possible I mean the problem is not that it's in the container, but the problem is that the error is in compiled
.so
code, If it was a python, you could debug it. But you can't debug compiled code
I wanted to see which applications use your GPU in your host
as there is nothing else on this server than the container for compreface therefore on the GPU is nothing else running than what is configured inside the compreface-core containter for the nivida runtime.
i will try to catch this and post here in an follow up
success with gpu: i started from the beginning:
1) re-installed nvidia-driver on ubuntu host with nvidia.run
1) removed all left-over container: docker system prune all
https://docs.docker.com/config/pruning/#prune-everything
2) changed my way how to install compreface:
docker-compose -f docker-compose.yml up && docker-compose -f docker-compose.yml logs -f
SUCCESS. its up and running with NVIDIA gpu in container.
what i changed:
before i only changed the .env
file in always the same source-dir.
seems to lead to errors
Describe the bug detection hangs, because of processes killed. the log scrolls up with some backtraces i cannot read
To Reproduce Steps to reproduce the behavior: starting docker-compose going to webgui testing facerec, but it doesnt run
Expected behavior running compreface gpu models with compreface-core in docker without error
Screenshots
Desktop (please complete the following information):
root@ub20-frigate4:/usr/src# nvidia-smi -L GPU 0: NVIDIA GeForce GTX 1660 (UUID: GPU-65d82c7a-fb69-3e25-a081-2baef57fba23) root@ub20-frigate4:/usr/src#
root@ub20-frigate4:/usr/src# cat .env registry=exadel/ postgres_username=postgres postgres_password=postgres postgres_db=frs postgres_domain=compreface-postgres-db postgres_port=5432 email_host=smtp.gmail.com email_username= email_from= email_password= enable_email_server=false save_images_to_db=true compreface_api_java_options=-Xmx8g compreface_admin_java_options=-Xmx8g ADMIN_VERSION=0.6.1 API_VERSION=0.6.1 FE_VERSION=0.6.1 CORE_VERSION=0.6.1-mobilenet-gpu root@ub20-frigate4:/usr/src#
root@ub20-frigate4:/usr/src# cat dc-cface.yml version: '3.4'
volumes: postgres-data:
services: compreface-postgres-db: image: postgres:11.5 container_name: "compreface-postgres-db" environment:
postgres-data:/var/lib/postgresql/data
compreface-admin: image: ${registry}compreface-admin:${ADMIN_VERSION} container_name: "compreface-admin" environment:
compreface-api
compreface-api: image: ${registry}compreface-api:${API_VERSION} container_name: "compreface-api" depends_on:
SAVE_IMAGES_TO_DB=${save_images_to_db}
compreface-fe: image: ${registry}compreface-fe:${FE_VERSION} container_name: "compreface-ui" ports:
compreface-admin
compreface-core: image: ${registry}compreface-core:${CORE_VERSION} container_name: "compreface-core" runtime: nvidia environment:
dont know whats needed, dont know how to get more debug-outout. but i will do to help..
testing NVIDIA-Docker was succes