aptly-dev / aptly

aptly - Debian repository management tool
https://www.aptly.info/
MIT License
2.56k stars 370 forks source link

too many open file handles #351

Closed avongluck-r1soft closed 2 weeks ago

avongluck-r1soft commented 8 years ago

When adding a large number of deb's in a row, aptly will begin throwing "too many open files" while API mode is running.

(aptly running in API mode with no-lock)

find /tmp/iso/pool -name "_._deb" -exec aptly repo add $2 {} \;

[+] php5-cli_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-common_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-curl_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-gd_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-gmp_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-ldap_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/23/57/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/50/0c/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/cb/8e/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/5b/4d/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/ba/ec/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/76/eb/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/19/05/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/8c/cc/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/f8/55/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/ec/c7/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/c5/74/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb into pool: open /repo/.aptly/pool/eb/2a/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package php5-json_1.3.2-2build1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php-json/php5-json_1.3.2-2build1_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package python-pil_2.3.0-1ubuntu3_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pillow/python-pil_2.3.0-1ubuntu3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package libpixman-1-0_0.30.2-2ubuntu1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pixman/libpixman-1-0_0.30.2-2ubuntu1_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package libpkcs11-helper1_1.11-1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pkcs11-helper/libpkcs11-helper1_1.11-1_amd64.deb ERROR: some files failed to be added Loading packages...

avongluck-r1soft commented 8 years ago

This can be worked around via "ulimit -n 64000" however it seems like aptly is keeping a lot of file handles open...

smira commented 8 years ago

@avongluck-r1soft I was trying to figure out the problem here... so you are adding packages via CLI aptly while API aptly is running? Is it related to API aptly?

smira commented 8 years ago

Also this is inefficient way to add packages (very slow), aptly can do much better job by importing more packages at once, it can do iteration on its own like aptly repo add xyz /tmp/iso/debs or you can use xargs to import more packages in each call.

vacri commented 8 years ago

I had a similar issue yesterday - had been uploading debs via REST. The service had been up for a long time (weeks if not months) and there's usually about half-a-dozen uploads per day from our build system. The REST API was accepting the debs (reporting that it uploaded them), but after reporting it published the resulting snapshot, the debs weren't actually available for the clients. In the log, I was getting GPG errors due to open files.

Mar 23 08:58:12 vimes bash[334]: Error #01: unable to initialize GPG signer: unable to execute gpg: fork/exec /usr/bin/gpg: too many open files (is gpg installed?):
Mar 23 08:58:12 vimes bash[334]: Meta: Operation aborted

Our setup only ever uploads one package at a time, and a restart fixed the issue. My guess is that the problem is not the bulk upload itself, but the number of open filehandles - and in our case, it had just grown over time.

This was all seen with default setting for nofiles (1024 soft 4096 hard) and aptly 0.9.5

smira commented 8 years ago

So aptly should have fd leak somewhere. Thanks I'll keep looking.

emopinata commented 5 years ago

@smira, I was actually just using xargs with aptly snapshot list -raw to clean up a bunch that were no longer in use and hit an issue between this and #818, where after doing the mass drop aptly would report that no mirrors, snapshots, or publish's existed. Using aptly db recover didn't resolve the issue, and running aptly db cleanup -dry-run -verbose would give me the too many open files error.

This is aptly 1.3.0 running on 16.04.4. Thankfully my repos are published to S3 and I had a snapshot of the instance volume from this morning to recover the instance itself. But it would be great to find out why this is happening before I attempt it again.

smira commented 4 years ago

@jalmansor just trying to understand what is going on there. as aptly via xargs is a process each time it runs, it should be something using too many fds for a single invocation and that something is causing goleveldb corruption.

biggreenogre commented 4 years ago

We are having a similar issue. Aptly has been in use for 2 years. We recently started testing using the API (with -no-lock) and are encountering this issue doing snapshot/db cleanups. aptly 1.3.0 (from 'http://repo.aptly.info squeeze main') Ubuntu 14.04 Trusty File system is a Glusterfs mount (gluster v3.10.12). $ ulimit -n 1024 $ ls -l repo/db/*.ldb | wc -l 1695

errors include "too many open file handles" and "ERROR: open /srv/shared/aptly/repo/db/1076864.ldb: stale NFS file handle"

Aptly repos are managed by bash scripts run by Jenkins.

At the command line (after stopping aptly server): for S in $(aptly snapshot list -raw); do aptly snapshot drop ${S};done "too many open files" fixed with ulimit -n 2048 Still see stale NFS messages but do not seem to be fatal.

Our API version is just the same commands with the API equivalent of CLI (curl ${aptly_host}/api/snapshots) run repeatedly in a loop until the number of snapshots is the same, before and after.

I have added ulimit -S -n 2048 to our /etc/init.d/aptly-server script and initial test is positive, will be increasing to 4096 for production.

NOTE: Running aptly db recover while still in the "too many files" state appeared to trash the db. It was restored from a copy made immediately prior.

smira commented 4 years ago

on a side note, I don't think having LevelDB on NFS is really good scenario (not sure about locking and consistency).

I'm thinking about what is triggering this open file exhaustion though. Looks to be related to number of .ldb files, but not clear what's up as it should be handled correctly in the LevelDB library.

smira commented 4 years ago

I tried to reproduce this by creating and dropping 2000 snapshots, but number of files in the db directory stays stable like it should (testing with master version though)

smira commented 4 years ago

Created 10GB worth of database, 3.5k files, ulimit set to 384, still no failures

smira commented 4 years ago
22:36:08.970318 version@stat F·[7 56 395 3025] S·10GiB[1KiB 98MiB 999MiB 9GiB] Sc·[1.75 0.98 1.00 0.95]
smira commented 4 years ago

Looks like I was able to hit that, but with forced compaction and aptly db cleanup

smira commented 4 years ago

but seems to be aptly db recover triggering all the tables to be on L0 which leads to this behavior.

smira commented 4 years ago

Summary:

Plan:

biggreenogre commented 4 years ago

Looks like the db got corrupted yesterday, jobs failed overnight based on this error:

`aptly@fulcrum-1:~$ aptly publish list

ERROR: unable to load list of repos: snapshot with uuid 66ed3ea5-2931-4e91-8c8d-8a0f6255de0c not found`

Listing snapshots, repos, etc worked fine. Restored db from earlier backup and so far, so good. Have not seen "too many files" error since adding ulimit -S -n 2048 to api server init script and before running CLI commands.

biggreenogre commented 4 years ago

Note: Re:- "stale NFS file handle" messages The db is NOT on an NFS file system but on a GLUSTERFS file system with NFS specifically disabled. I don't think this is related to the issue at hand, rather a minor glusterfs problem.

smira commented 4 years ago

882 might help here as well

flotho commented 2 years ago

Hi @biggreenogre , did you retry with recent code ?

Regards

biggreenogre commented 2 years ago

I have not. I'm afraid I don;t have any experience with GO or how to build it. I also need the fix for the S3/MD5SUM issue. Is there a quick "howto" on building aptly? e.g. how to install the GO build environment

NeroBurner commented 1 month ago

I'm hit by the open file handles bug (triggers every 2 to 5 days).

And starting today I'm running aptly 1.5.0+162+g8029305d from commit 8029305d (latest nightly at the time) running on Ubuntu 24.04 noble amd64.

Will report back in about a month if it worked :crossed_fingers:

neolynx commented 1 month ago

how are you running aptly ? via command or API ?

for commands, what is your ulimit -n ? if aptly runs as a service, systemd need to configure LimitNOFILE= in the service file.

NeroBurner commented 1 month ago

I created a limits file to raise the limit for the whole server

sudo tee /etc/security/limits.d/20-nprocs.conf << EOF
# raise number of opened file per process for each user to 256k (from default 1024)
*   soft   nofile   256000
root soft   nofile   256000
EOF

And I'm running the API with --no-lock as systemd unit

sudo tee /etc/systemd/system/aptly-api.service << EOF
[Unit]
Description=Aptly REST API service

[Service]
User=aptly
WorkingDirectory=/var/lib/aptly
ExecStart=/usr/bin/aptly api serve -listen "localhost:8000" -no-lock
KillSignal=SIGTERM
KillMode=process
TimeoutStopSec=15s

[Install]
WantedBy=multi-user.target
EOF

Is that enough? Or do I still need to set LimitNOFILE=256000?

and weekly I'm running aptly db cleanup as aptly user (same user that runs the API)

sudo -u aptly crontab -l
5 8 * * Sun /usr/bin/aptly db cleanup
neolynx commented 1 month ago

yes, when aptly is running as systemd service, then LimitNOFILE=256000 needs to be in the service configuration or overlay.

the cron environment variable might come from /etc/security/limits.d/

NeroBurner commented 1 month ago

Thank you very much! This explains why the "too-many-open-files" bug was triggered in about the same time regardless of which nolimit I set in limits.conf.d. I've now added LimitNOFILE= to the systemd-unit file

For future reference, after the restart of the aptly-api systemd-service I get the following for open files

$ ls -l /var/lib/aptly/db/*.ldb | wc -l
123
neolynx commented 1 month ago

I think it is expected that the db files are open. 123 is also not a huge number...

Using lsof -p $(pidof aptly) would show the open files. If you see a growing number, then it is maybe a bug. Are yu using async api and tasks ? They could also cause issues, when not remved.

May I close the issue ?

NeroBurner commented 1 month ago

I think yes, you can. I'll report back in about a month (or earlier if the bug triggers again)


For future reference the number of open files for about 2 hours runtime two new *.ldb files were created

$ ls -l /var/lib/aptly/db/*.ldb | wc -l
125

I don't understand lsof output, but just in case someone understands better than me, here it is ;)

$ sudo lsof -p $(pidof aptly)
[sudo] password for builder: 
COMMAND  PID  USER   FD      TYPE             DEVICE SIZE/OFF    NODE NAME
aptly   7237 aptly  cwd       DIR               0,36      208     256 /var/lib/aptly
aptly   7237 aptly  rtd       DIR              253,2     4096       2 /
aptly   7237 aptly  txt       REG              253,2 37622688 2518976 /usr/bin/aptly
aptly   7237 aptly    0r      CHR                1,3      0t0       5 /dev/null
aptly   7237 aptly    1u     unix 0xffff8d8a4a38a800      0t0   54916 type=STREAM (CONNECTED)
aptly   7237 aptly    2u     unix 0xffff8d8a4a38a800      0t0   54916 type=STREAM (CONNECTED)
aptly   7237 aptly    3u     unix 0xffff8d8aa84cec00      0t0   56498 type=DGRAM (CONNECTED)
aptly   7237 aptly    4u  a_inode               0,15        0      64 [eventpoll:3,5,7]
aptly   7237 aptly    5r     FIFO               0,14      0t0   56497 pipe
aptly   7237 aptly    6w     FIFO               0,14      0t0   56497 pipe
aptly   7237 aptly    7u     IPv4              54917      0t0     TCP localhost:8000 (LISTEN)

I found another way to list the open files of a process here: https://stackoverflow.com/questions/21752067/counting-open-files-per-process

and its only 8? :sweat_smile:

$ sudo ls /proc/$(pidof aptly)/fd/ 2>/dev/null | wc -l
8
NeroBurner commented 3 weeks ago

The file limit now is fixed (I believe), but aptly-api 1.5.0+162+g8029305d is OOM killed before 2 days of uptime :see_no_evil:

Can't check if the number of open files was stable or growing as earlier as the process is killed.

× aptly-api.service - Aptly REST API service
     Loaded: loaded (/etc/systemd/system/aptly-api.service; enabled; preset: enabled)
     Active: failed (Result: oom-kill) since Wed 2024-08-14 06:03:41 CEST; 2h 38min ago
   Duration: 1d 18h 31min 1.224s
    Process: 684 ExecStart=/usr/bin/aptly api serve -listen localhost:8000 -no-lock (code=killed, signal=KILL)
   Main PID: 684 (code=killed, signal=KILL)
      Tasks: 1 (limit: 4613)
     Memory: 5.5M (peak: 3.5G swap: 544.0K swap peak: 3.7G)
        CPU: 2h 32min 34.991s
     CGroup: /system.slice/aptly-api.service
             └─1453 gpg-agent --homedir /var/lib/aptly/.gnupg --use-standard-socket --daemon

Aug 14 05:24:19 repo aptly[684]: 2024-08-14T05:24:19+02:00 DBG Executing task synchronously
Aug 14 05:24:19 repo aptly[684]: Signing file 'Release' with gpg, please enter your passphrase when prompted:
Aug 14 05:24:19 repo aptly[684]: Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:
Aug 14 05:24:19 repo aptly[684]: [GIN] 2024/08/14 - 05:24:19 | 201 |  1.133699155s |   192.168.0.5 | POST     "/api/publish/:."
Aug 14 06:03:08 repo aptly[684]: [GIN] 2024/08/14 - 06:03:08 | 200 |  288.845489ms |   192.168.0.5 | GET      "/api/version"
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: A process of this unit has been killed by the OOM killer.
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Main process exited, code=killed, status=9/KILL
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Failed with result 'oom-kill'.
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Unit process 1453 (gpg-agent) remains running after unit stopped.
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Consumed 2h 32min 34.991s CPU time, 3.5G memory peak, 3.7G memory swap peak.
NeroBurner commented 3 weeks ago

OOM issue is most likely https://github.com/aptly-dev/aptly/issues/1323

neolynx commented 3 weeks ago

yes, there was a slip which did not free memory on each API call, but this is fixed in the nightly builds.

the number of open files should be low, when aptly is not doing anything. the 8 FDs (number 0-7 above) are stdin, stdout, stderr, 3 from go (pipes and events) and the tcp socket.

do you still have issues with open files, or can we close this ?

NeroBurner commented 3 weeks ago

thanks for the explanation!

Yes I think this issue is resolved! Thanks!