digitalmethodsinitiative / 4cat

The 4CAT Capture and Analysis Toolkit provides modular data capture & analysis for a variety of social media platforms.
Other
237 stars 55 forks source link

out of memory #436

Open rvanvliet opened 2 weeks ago

rvanvliet commented 2 weeks ago

The servers get 'out of memory' after deploying the application. Can't use application. Jul 5 13:25:52 xxxca01 kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=docker-23adcba4fea1f347db0b8f7d2d25b6f59cc19799efd66f9eb1db0dfa30837b11.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-23adcba4fea1f347db0b8f7d2d25b6f59cc19799efd66f9eb1db0dfa30837b11.scope,task=python3,pid=53261,uid=0 Jul 5 13:25:52 xxxca01 kernel: Out of memory: Killed process 53261 (python3) total-vm:101261724kB, anon-rss:99978192kB, file-rss:128kB, shmem-rss:0kB, UID:0 pgtables:198196kB oom_score_adj:0

What I did:

  1. Install "Docker Engine - Community" version 27.0.3
  2. Download the docker-compose.yml file and .env from this repo.
  3. Run: docker compose up --detach
  4. Check running containers with docker ps -a (all 3 running) Running the stable 1.45 version of 4cat.

VM. OS AlmaLinux 9, 4core cpu, 128 GB memory. I started with 4 and doubled couple times till 128 GB.

Anyone encountered same issue?

dale-wahl commented 2 weeks ago

That seems extreme (I have not seen that issues and been running with 8 GB of ram). You say all the containers are running; what do their logs say? docker container logs 4cat_backend and 4cat_frontend and 4cat_db? What have you been running on 4CAT?

rvanvliet commented 2 weeks ago

That seems extreme (I have not seen that issues and been running with 8 GB of ram). You say all the containers are running; what do their logs say? docker container logs 4cat_backend and 4cat_frontend and 4cat_db? What have you been running on 4CAT?

Seems like there is no pidfile. 4cat_backend | ...error while starting 4CAT Backend Daemon (pidfile not found).

docker container logs 4cat_db (looks OK to me) PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-07-05 11:53:22.792 UTC [1] LOG: starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-07-05 11:53:22.793 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-07-05 11:53:22.793 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-07-05 11:53:22.800 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-07-05 11:53:22.810 UTC [29] LOG: database system was shut down at 2024-07-05 11:43:48 UTC 2024-07-05 11:53:22.830 UTC [1] LOG: database system is ready to accept connections 2024-07-05 11:58:23.514 UTC [27] LOG: checkpoint starting: time 2024-07-05 11:58:24.385 UTC [27] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.579 s, sync=0.010 s, total=0.949 s; sync files=2, longest=0.006 s, average=0.005 s; distance=0 kB, estimate=0 kB; lsn=0/19B5610, redo lsn=0/19B55D8 2024-07-05 12:14:16.219 UTC [1] LOG: received fast shutdown request 2024-07-05 12:14:16.227 UTC [1] LOG: aborting any active transactions 2024-07-05 12:14:16.386 UTC [1] LOG: background worker "logical replication launcher" (PID 32) exited with exit code 1 2024-07-05 12:14:16.407 UTC [27] LOG: shutting down 2024-07-05 12:14:16.417 UTC [27] LOG: checkpoint starting: shutdown immediate 2024-07-05 12:14:16.660 UTC [27] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.216 s, sync=0.001 s, total=0.254 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=0 kB; lsn=0/19B56C0, redo lsn=0/19B56C0 2024-07-05 12:14:16.679 UTC [1] LOG: database system is shut down PostgreSQL Database directory appears to contain a database; Skipping initialization 2024-07-05 12:53:21.726 UTC [1] LOG: starting PostgreSQL 16.3 (Debian 16.3-1.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2024-07-05 12:53:21.727 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432 2024-07-05 12:53:21.727 UTC [1] LOG: listening on IPv6 address "::", port 5432 2024-07-05 12:53:21.734 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" 2024-07-05 12:53:21.745 UTC [29] LOG: database system was shut down at 2024-07-05 12:14:16 UTC 2024-07-05 12:53:21.765 UTC [1] LOG: database system is ready to accept connections

4cat_frontend | Backend has not started - sleeping 4cat_frontend | Backend has not started - sleeping 4cat_frontend | Backend has not started - sleeping

dale-wahl commented 2 weeks ago

Interesting. Something must be failing in the backend and 4CAT cannot start. There may be a log here: docker exec 4cat_backend cat /usr/src/app/logs/4cat.stderr

The /usr/src/app/logs/ folder is a shared volume with the front and backend and should contain all the related logs. You can explore them interactively docker exec -it 4cat_backend bash.

rvanvliet commented 2 weeks ago

Interesting. Something must be failing in the backend and 4CAT cannot start. There may be a log here: docker exec 4cat_backend cat /usr/src/app/logs/4cat.stderr

The /usr/src/app/logs/ folder is a shared volume with the front and backend and should contain all the related logs. You can explore them interactively docker exec -it 4cat_backend bash.

Thank you for the suggestions! The 4cat.stderr log file is empty.

dale-wahl commented 2 weeks ago

That sounds like 4CAT never even attempts to start. But it did if you saw the "4cat_backend | ...error while starting 4CAT Backend Daemon (pidfile not found)." log. Are you developing anything or updated the code in any way?

You could try this: docker exec -it 4cat_backend python -c "import backend.bootstrap as bootstrap; bootstrap.run(as_daemon=False, log_level='DEBUG');" That would try to run 4CAT's backend directly and you might see why it cannot start.

rvanvliet commented 1 week ago

I've run the command from above but I don't see a difference. There is still no log cat the location /usr/src/app/logs/4cat.stderr The server is clean and there is nothing on it (only Docker), it is dedicated for this 4cat deployment.

dale-wahl commented 1 week ago

Do you mean to say that running the docker exec command above returned 4cat_backend | ...error while starting 4CAT Backend Daemon (pidfile not found).? That should not be possible. You would need to run the command while the Docker container is running and after you see the no pidfile error (i.e., after docker compose up -d).

rvanvliet commented 1 week ago

All containers are up with command docker compose up -d After that I did the command docker exec -it 4cat_backend python -c "import backend.bootstrap as bootstrap; bootstrap.run(as_daemon=False, log_level='DEBUG');"

Output: docker container logs 4cat_backend 4CAT migration agent

Interactive: no Pull latest release: no Pull branch: no Restart after migration: no Repository URL: https://github.com/digitalmethodsinitiative/4cat.git .current-version path: config/.current-version

WARNING: Migration can take quite a while. 4CAT will not be available during migration. If 4CAT is still running, it will be shut down now (forcibly if necessary).

Migration finished. You can now safely restart 4CAT.

Configuration file config/config.ini already exists Checking Docker .env variables and updating if necessary

Starting app 4CAT is accessible at: http://localhost

Killed Waiting for postgres... PostgreSQL started Database already created

       4CAT migration agent

.... loop here Output: docker container logs 4cat_backend

Backend has not started - sleeping Backend has not started - sleeping Traceback (most recent call last): File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/usr/src/app/docker/wait-for-backend.py", line 22, in main() File "/usr/src/app/docker/wait-for-backend.py", line 16, in main call_api("worker-status")["response"]["running"] File "/usr/src/app/common/lib/helpers.py", line 427, in call_api connection.connect((config.get('API_HOST'), config.get('API_PORT'))) socket.gaierror: [Errno -2] Name or service not known Backend has not started - sleeping Backend has not started - sleeping

and Output: docker exec 4cat_backend cat /usr/src/app/logs/4cat.stderr cat: /usr/src/app/logs/4cat.stderr: No such file or directory

dale-wahl commented 1 week ago

OK, that is helpful. The Killed is a message from Docker. It looks like Docker kills the 4cat_backend container before 4CAT can ever start and then attempts to restart it (hence the loop and likely how you ended up with an OOM error). You ought to have some output from docker exec -it 4cat_backend python -c "import backend.bootstrap as bootstrap; bootstrap.run(as_daemon=False, log_level='DEBUG');", but my guess is that the container dies before you get any response.

I would check your Docker set up. Perhaps you have limited the memory in some way or it is a bad Docker installation. I am not at all familiar with that OS or how to set up Docker on it. You can look into what might cause a Killed message from Docker. Generally, you should be able to run 4CAT with 4 Gigs (frontend uses about 1, the backend much less until you start running some analyses). 8 is better for text analyses. I understand you have more, but my guess is that Docker is not able to use the memory. You can limit memory in Docker per container in addition to overall; maybe that is the issue.

dale-wahl commented 1 week ago

You could also install 4CAT directly. You do not need Docker; it is usually more convenient, but 4CAT runs on Linux. https://github.com/digitalmethodsinitiative/4cat/wiki/Installing-4CAT#install-4cat-manually