Open berkeli opened 2 years ago
To check disks and partitions I ran the command df -h
which gave me the following table:
[berkeli@ip-172-31-81-17 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 474M 0 474M 0% /dev
tmpfs 483M 0 483M 0% /dev/shm
tmpfs 483M 464K 483M 1% /run
tmpfs 483M 0 483M 0% /sys/fs/cgroup
/dev/xvda1 8.0G 8.0G 7.9M 100% /
tmpfs 97M 0 97M 0% /run/user/1000
tmpfs 97M 0 97M 0% /run/user/1002
From this table it was clear that the disk that's full is /dev/xvda1
and I need to free up space there.
cd /
and ran the command to check disk usage of each folder:
du -sh /*
-s
flag summarizes the size of a directory, for now I only want directories to see which one might have largest file-h
display in a human-readable format in MB/GB, etc/*
indicates to check all folders in the root directory (/)du -sh /* 2>/dev/null
2>/dev/null
this indicates to send any line that exited with code 2 to the /dev/null file.
The result of this command was a bit surprising, as sizes of folders listed would not add up to the ~8GB disk usage I saw earlier:
[berkeli@ip-172-31-81-17 /]$ du -sh /* 2>/dev/null
0 /bin
26M /boot
0 /dev
18M /etc
20K /home
0 /lib
0 /lib64
0 /local
0 /media
0 /mnt
112K /opt
0 /proc
0 /root
456K /run
0 /sbin
0 /srv
0 /sys
4.0K /tmp
1.1G /usr
503M /var
du -sh / 2>/dev/null
which showed me that disk usage is only 1.6GB
[berkeli@ip-172-31-81-17 /]$ du -sh / 2>/dev/null
1.6G /
[berkeli@ip-172-31-81-17 /]$ sudo du -sh / 2>/dev/null
[sudo] password for berkeli:
1.8G /
Which showed slightly bigger disk usage, but nowhere near enough to justify 100% usage. I decided to stop looking for a large file here as the requirements suggest it might be held up by a process.
lsof
(list open files). Initially this gave me a huge list of files that didn't give me any clues.
After taking a look at the options for lsof and digging around the internet, I found that a process can take up disk space if it opened a file and that file was subsequently deleted. To verify this, I found an option +|-L [l]
, which allows us to filter by count of linked files.
In my case:
lsof +L1
+L1
- stands for show me files with less than 1 link count, i.e. opened but no longer linked.
The command didn't result in much:
[berkeli@ip-172-31-81-17 /]$ lsof +L1
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
I decided to run it again with sudo access:
[berkeli@ip-172-31-81-17 /]$ sudo lsof +L1
[sudo] password for berkeli:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
systemd-j 1740 root txt REG 202,1 325536 0 199663 /usr/lib/systemd/systemd-journald (deleted)
systemd-l 2579 root txt REG 202,1 606928 0 199665 /usr/lib/systemd/systemd-logind (deleted)
sleep 3300 root 3w REG 202,1 6587219968 0 2186 /root/$TMP (deleted)
findme 3483 root 3w REG 202,1 6587219968 0 2186 /root/$TMP (deleted)
This seemed promissing as the size matches up with the disk usage and even the process name is called findme
:)
I decided to understand what it is before killing the process:
I ran ps -p 3483
to get a bit more details about the process
[berkeli@ip-172-31-81-17 sbin]$ ps -p 3483
PID TTY TIME CMD
3483 ? 00:00:01 findme
Nothing unusual or interesting here, except that it was launched with a command findme.
In linux, we can check command origins with which
command:
[berkeli@ip-172-31-81-17 sbin]$ which findme
/usr/sbin/findme
This showed me the location of the executable.
I checked the type of file with file
command
[berkeli@ip-172-31-81-17 sbin]$ file findme
findme: POSIX shell script, ASCII text executable
It's a shell script! let's check the source code:
[berkeli@ip-172-31-81-17 sbin]$ cat findme
#!/bin/sh
set -e
TMP="$(mktemp)"
exec 3>"\$TMP"
dd bs="1M" count="9000" if="/dev/zero" of="\$TMP" || :
rm -f "\$TMP"
while true; do sleep 10; done
It seems to be a script with a permanent loop, so It should be safe to kill the process with sudo kill -9 3483
-9
is the signal ID, in this case for SIGKILLI then verified disk usage again:
[berkeli@ip-172-31-81-17 sbin]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 474M 0 474M 0% /dev
tmpfs 483M 0 483M 0% /dev/shm
tmpfs 483M 408K 483M 1% /run
tmpfs 483M 0 483M 0% /sys/fs/cgroup
/dev/xvda1 8.0G 1.9G 6.2G 24% /
tmpfs 97M 0 97M 0% /run/user/1002
We now have 6.2GB of free space, yay!
Additional notes:
Based on notes from Radha, I decided to relaunch the findme task and try to free up space without killing it. After launching the program in detached mod, the disk was full again:
[berkeli@ip-172-31-81-17 8308]$ sudo lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
systemd-j 1740 root txt REG 202,1 325536 0 199663 /usr/lib/systemd/systemd-journald (deleted)
systemd-l 2579 root txt REG 202,1 606928 0 199665 /usr/lib/systemd/systemd-logind (deleted)
sh 8308 root 3w REG 202,1 6594494464 0 8409400 /usr/bin/$TMP (deleted)
sleep 9072 root 3w REG 202,1 6594494464 0 8409400 /usr/bin/$TMP (deleted)
Each process in Linux saves files in proc folder, and can be accessed with commands.
sudo ls -lh /proc/8308/fd
which showed me the files related to this process.[berkeli@ip-172-31-81-17 /]$ sudo ls -lh /proc/8308/fd
total 0
lrwx------ 1 root root 64 Nov 9 14:45 0 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 9 14:45 1 -> /dev/pts/0
lrwx------ 1 root root 64 Nov 9 14:45 2 -> /dev/pts/0
lr-x------ 1 root root 64 Nov 9 14:45 255 -> /usr/sbin/findme
l-wx------ 1 root root 64 Nov 9 14:45 3 -> /usr/bin/$TMP (deleted)
To free up space we can truncate the file so it takes up 0 space with the following command: :>/proc/8308/fd/3
Unfortunately that gave me a permission denied and running it with sudo didn't help because sudo doesn't apply to the redirection.
To resolve, I ran the command in a sudo terminal via sudo sh -c ':>/proc/8308/fd/3'
and this resolved the issue.
[berkeli@ip-172-31-81-17 /]$ sudo sh -c ':>/proc/8308/fd/3'
[berkeli@ip-172-31-81-17 /]$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 474M 0 474M 0% /dev
tmpfs 483M 0 483M 0% /dev/shm
tmpfs 483M 408K 483M 1% /run
tmpfs 483M 0 483M 0% /sys/fs/cgroup
/dev/xvda1 8.0G 1.9G 6.2G 24% /
tmpfs 97M 0 97M 0% /run/user/1002
Let's take a look at the findme script and what it does:
#!/bin/sh
set -e
Line above instructs shell to exit if command fails (non-zero outcome)
TMP="$(mktemp)"
here we create a variable TMP
and assign it the outcome of mktemp
command which create a temporary folder.
exec 3>"\$TMP"
Here we redirect command outputs to TMP folder? Not 100% about this one.
dd bs="1M" count="9000" if="/dev/zero" of="\$TMP" || :
here we call the dd
command which copies files.
bs="1M"
copy upto 1m bytescount="9000"
copy 9000 input blocks if="/dev/zero"
read from /dev/zero fileof="\$TMP"
output to TMP folder created earlier|| :
stands for in case of error run :
which is do nothing
rm -f "\$TMP"
while true; do sleep 10; done
Lines above instruct to delete the folder, wait 10 seconds and close - so it's not a permanent loop which I assumed at first.
https://docs.google.com/document/d/1V6HEu_OcJ3MHH-aHzUfANf06VJa1rPcGHcpBwql7QLA/edit#heading=h.h9hu29mv2qa1