Closed avongluck-r1soft closed 2 weeks ago
This can be worked around via "ulimit -n 64000" however it seems like aptly is keeping a lot of file handles open...
@avongluck-r1soft I was trying to figure out the problem here... so you are adding packages via CLI aptly while API aptly is running? Is it related to API aptly?
Also this is inefficient way to add packages (very slow), aptly can do much better job by importing more packages at once, it can do iteration on its own like aptly repo add xyz /tmp/iso/debs
or you can use xargs
to import more packages in each call.
I had a similar issue yesterday - had been uploading debs via REST. The service had been up for a long time (weeks if not months) and there's usually about half-a-dozen uploads per day from our build system. The REST API was accepting the debs (reporting that it uploaded them), but after reporting it published the resulting snapshot, the debs weren't actually available for the clients. In the log, I was getting GPG errors due to open files.
Mar 23 08:58:12 vimes bash[334]: Error #01: unable to initialize GPG signer: unable to execute gpg: fork/exec /usr/bin/gpg: too many open files (is gpg installed?):
Mar 23 08:58:12 vimes bash[334]: Meta: Operation aborted
Our setup only ever uploads one package at a time, and a restart fixed the issue. My guess is that the problem is not the bulk upload itself, but the number of open filehandles - and in our case, it had just grown over time.
This was all seen with default setting for nofiles (1024 soft 4096 hard) and aptly 0.9.5
So aptly should have fd leak somewhere. Thanks I'll keep looking.
@smira, I was actually just using xargs
with aptly snapshot list -raw
to clean up a bunch that were no longer in use and hit an issue between this and #818, where after doing the mass drop aptly
would report that no mirrors, snapshots, or publish's existed. Using aptly db recover
didn't resolve the issue, and running aptly db cleanup -dry-run -verbose
would give me the too many open files error.
This is aptly 1.3.0 running on 16.04.4. Thankfully my repos are published to S3 and I had a snapshot of the instance volume from this morning to recover the instance itself. But it would be great to find out why this is happening before I attempt it again.
@jalmansor just trying to understand what is going on there. as aptly via xargs
is a process each time it runs, it should be something using too many fds for a single invocation and that something is causing goleveldb corruption.
We are having a similar issue. Aptly has been in use for 2 years. We recently started testing using the API (with -no-lock) and are encountering this issue doing snapshot/db cleanups. aptly 1.3.0 (from 'http://repo.aptly.info squeeze main') Ubuntu 14.04 Trusty File system is a Glusterfs mount (gluster v3.10.12). $ ulimit -n 1024 $ ls -l repo/db/*.ldb | wc -l 1695
errors include "too many open file handles" and "ERROR: open /srv/shared/aptly/repo/db/1076864.ldb: stale NFS file handle"
Aptly repos are managed by bash scripts run by Jenkins.
At the command line (after stopping aptly server):
for S in $(aptly snapshot list -raw); do aptly snapshot drop ${S};done
"too many open files" fixed with ulimit -n 2048
Still see stale NFS messages but do not seem to be fatal.
Our API version is just the same commands with the API equivalent of CLI (curl ${aptly_host}/api/snapshots
) run repeatedly in a loop until the number of snapshots is the same, before and after.
I have added ulimit -S -n 2048
to our /etc/init.d/aptly-server script and initial test is positive, will be increasing to 4096 for production.
NOTE: Running aptly db recover
while still in the "too many files" state appeared to trash the db. It was restored from a copy made immediately prior.
on a side note, I don't think having LevelDB on NFS is really good scenario (not sure about locking and consistency).
I'm thinking about what is triggering this open file exhaustion though. Looks to be related to number of .ldb
files, but not clear what's up as it should be handled correctly in the LevelDB library.
I tried to reproduce this by creating and dropping 2000 snapshots, but number of files in the db directory stays stable like it should (testing with master
version though)
Created 10GB worth of database, 3.5k files, ulimit set to 384, still no failures
22:36:08.970318 version@stat F·[7 56 395 3025] S·10GiB[1KiB 98MiB 999MiB 9GiB] Sc·[1.75 0.98 1.00 0.95]
Looks like I was able to hit that, but with forced compaction and aptly db cleanup
but seems to be aptly db recover
triggering all the tables to be on L0 which leads to this behavior.
Summary:
too many open files
seem to be related to huge number of tables on L0 (level zero)aptly db recover
even on healthy DB forces all the tables to L0 which leads to the error abovePlan:
aptly db recover
, don't let users run unless they understand the consequences and have a backupaptly db cleanup
and aptly db recover
Looks like the db got corrupted yesterday, jobs failed overnight based on this error:
`aptly@fulcrum-1:~$ aptly publish list
ERROR: unable to load list of repos: snapshot with uuid 66ed3ea5-2931-4e91-8c8d-8a0f6255de0c not found`
Listing snapshots, repos, etc worked fine.
Restored db from earlier backup and so far, so good.
Have not seen "too many files" error since adding ulimit -S -n 2048
to api server init script and before running CLI commands.
Note: Re:- "stale NFS file handle" messages The db is NOT on an NFS file system but on a GLUSTERFS file system with NFS specifically disabled. I don't think this is related to the issue at hand, rather a minor glusterfs problem.
Hi @biggreenogre , did you retry with recent code ?
Regards
I have not. I'm afraid I don;t have any experience with GO or how to build it. I also need the fix for the S3/MD5SUM issue. Is there a quick "howto" on building aptly? e.g. how to install the GO build environment
I'm hit by the open file handles bug (triggers every 2 to 5 days).
And starting today I'm running aptly 1.5.0+162+g8029305d
from commit 8029305d (latest nightly at the time) running on Ubuntu 24.04 noble amd64.
Will report back in about a month if it worked :crossed_fingers:
how are you running aptly ? via command or API ?
for commands, what is your ulimit -n
?
if aptly runs as a service, systemd need to configure LimitNOFILE=
in the service file.
I created a limits file to raise the limit for the whole server
sudo tee /etc/security/limits.d/20-nprocs.conf << EOF
# raise number of opened file per process for each user to 256k (from default 1024)
* soft nofile 256000
root soft nofile 256000
EOF
And I'm running the API with --no-lock
as systemd unit
sudo tee /etc/systemd/system/aptly-api.service << EOF
[Unit]
Description=Aptly REST API service
[Service]
User=aptly
WorkingDirectory=/var/lib/aptly
ExecStart=/usr/bin/aptly api serve -listen "localhost:8000" -no-lock
KillSignal=SIGTERM
KillMode=process
TimeoutStopSec=15s
[Install]
WantedBy=multi-user.target
EOF
Is that enough? Or do I still need to set LimitNOFILE=256000
?
and weekly I'm running aptly db cleanup
as aptly
user (same user that runs the API)
sudo -u aptly crontab -l
5 8 * * Sun /usr/bin/aptly db cleanup
yes, when aptly is running as systemd service, then LimitNOFILE=256000 needs to be in the service configuration or overlay.
the cron environment variable might come from /etc/security/limits.d/
Thank you very much! This explains why the "too-many-open-files" bug was triggered in about the same time regardless of which nolimit I set in limits.conf.d
. I've now added LimitNOFILE=
to the systemd-unit file
For future reference, after the restart of the aptly-api systemd-service I get the following for open files
$ ls -l /var/lib/aptly/db/*.ldb | wc -l
123
I think it is expected that the db files are open. 123 is also not a huge number...
Using lsof -p $(pidof aptly)
would show the open files. If you see a growing number, then it is maybe a bug.
Are yu using async api and tasks ? They could also cause issues, when not remved.
May I close the issue ?
I think yes, you can. I'll report back in about a month (or earlier if the bug triggers again)
For future reference the number of open files for about 2 hours runtime two new *.ldb files were created
$ ls -l /var/lib/aptly/db/*.ldb | wc -l
125
I don't understand lsof
output, but just in case someone understands better than me, here it is ;)
$ sudo lsof -p $(pidof aptly)
[sudo] password for builder:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
aptly 7237 aptly cwd DIR 0,36 208 256 /var/lib/aptly
aptly 7237 aptly rtd DIR 253,2 4096 2 /
aptly 7237 aptly txt REG 253,2 37622688 2518976 /usr/bin/aptly
aptly 7237 aptly 0r CHR 1,3 0t0 5 /dev/null
aptly 7237 aptly 1u unix 0xffff8d8a4a38a800 0t0 54916 type=STREAM (CONNECTED)
aptly 7237 aptly 2u unix 0xffff8d8a4a38a800 0t0 54916 type=STREAM (CONNECTED)
aptly 7237 aptly 3u unix 0xffff8d8aa84cec00 0t0 56498 type=DGRAM (CONNECTED)
aptly 7237 aptly 4u a_inode 0,15 0 64 [eventpoll:3,5,7]
aptly 7237 aptly 5r FIFO 0,14 0t0 56497 pipe
aptly 7237 aptly 6w FIFO 0,14 0t0 56497 pipe
aptly 7237 aptly 7u IPv4 54917 0t0 TCP localhost:8000 (LISTEN)
I found another way to list the open files of a process here: https://stackoverflow.com/questions/21752067/counting-open-files-per-process
and its only 8? :sweat_smile:
$ sudo ls /proc/$(pidof aptly)/fd/ 2>/dev/null | wc -l
8
The file limit now is fixed (I believe), but aptly-api 1.5.0+162+g8029305d
is OOM killed before 2 days of uptime :see_no_evil:
Can't check if the number of open files was stable or growing as earlier as the process is killed.
× aptly-api.service - Aptly REST API service
Loaded: loaded (/etc/systemd/system/aptly-api.service; enabled; preset: enabled)
Active: failed (Result: oom-kill) since Wed 2024-08-14 06:03:41 CEST; 2h 38min ago
Duration: 1d 18h 31min 1.224s
Process: 684 ExecStart=/usr/bin/aptly api serve -listen localhost:8000 -no-lock (code=killed, signal=KILL)
Main PID: 684 (code=killed, signal=KILL)
Tasks: 1 (limit: 4613)
Memory: 5.5M (peak: 3.5G swap: 544.0K swap peak: 3.7G)
CPU: 2h 32min 34.991s
CGroup: /system.slice/aptly-api.service
└─1453 gpg-agent --homedir /var/lib/aptly/.gnupg --use-standard-socket --daemon
Aug 14 05:24:19 repo aptly[684]: 2024-08-14T05:24:19+02:00 DBG Executing task synchronously
Aug 14 05:24:19 repo aptly[684]: Signing file 'Release' with gpg, please enter your passphrase when prompted:
Aug 14 05:24:19 repo aptly[684]: Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:
Aug 14 05:24:19 repo aptly[684]: [GIN] 2024/08/14 - 05:24:19 | 201 | 1.133699155s | 192.168.0.5 | POST "/api/publish/:."
Aug 14 06:03:08 repo aptly[684]: [GIN] 2024/08/14 - 06:03:08 | 200 | 288.845489ms | 192.168.0.5 | GET "/api/version"
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: A process of this unit has been killed by the OOM killer.
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Main process exited, code=killed, status=9/KILL
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Failed with result 'oom-kill'.
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Unit process 1453 (gpg-agent) remains running after unit stopped.
Aug 14 06:03:41 repo systemd[1]: aptly-api.service: Consumed 2h 32min 34.991s CPU time, 3.5G memory peak, 3.7G memory swap peak.
OOM issue is most likely https://github.com/aptly-dev/aptly/issues/1323
yes, there was a slip which did not free memory on each API call, but this is fixed in the nightly builds.
the number of open files should be low, when aptly is not doing anything. the 8 FDs (number 0-7 above) are stdin, stdout, stderr, 3 from go (pipes and events) and the tcp socket.
do you still have issues with open files, or can we close this ?
thanks for the explanation!
Yes I think this issue is resolved! Thanks!
When adding a large number of deb's in a row, aptly will begin throwing "too many open files" while API mode is running.
(aptly running in API mode with no-lock)
find /tmp/iso/pool -name "_._deb" -exec aptly repo add $2 {} \;
[+] php5-cli_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-common_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-curl_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-gd_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-gmp_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [+] php5-ldap_5.5.9+dfsg-1ubuntu4.3_amd64 added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/23/57/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-mysql_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/50/0c/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-odbc_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/cb/8e/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-pgsql_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/5b/4d/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-pspell_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/ba/ec/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-readline_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/76/eb/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-recode_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/19/05/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-snmp_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/8c/cc/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-sqlite_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/f8/55/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-tidy_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/ec/c7/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-xmlrpc_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb into pool: open /repo/.aptly/pool/c5/74/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php5-xsl_5.5.9+dfsg-1ubuntu4.3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to import file /tmp/iso/pool/main/p/php5/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb into pool: open /repo/.aptly/pool/eb/2a/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php5/php-pear_5.5.9+dfsg-1ubuntu4.3_all.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package php5-json_1.3.2-2build1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/php-json/php5-json_1.3.2-2build1_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package python-pil_2.3.0-1ubuntu3_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pillow/python-pil_2.3.0-1ubuntu3_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package libpixman-1-0_0.30.2-2ubuntu1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pixman/libpixman-1-0_0.30.2-2ubuntu1_amd64.deb ERROR: some files failed to be added Loading packages... [!] Unable to save package libpkcs11-helper1_1.11-1_amd64: open /repo/.aptly/db/018015.ldb: too many open files [!] Some files were skipped due to errors: /tmp/iso/pool/main/p/pkcs11-helper/libpkcs11-helper1_1.11-1_amd64.deb ERROR: some files failed to be added Loading packages...