Closed ashald closed 3 years ago
To reproduce (at least on our VMs):
stem=test-$(hostname) count=600 size=100m set -o notify; for i in $(seq $count); do docker volume create -d lvm --opt size=0.1g ${stem}-${i}; done & function finished () { echo finished at: $(date); while true; do printf "\a"; sleep 1; done; } wait %1; finished
reboot; # wait for host to come back
set -o notify; docker volume ls & docker-volumes-prune -a "1 minute" & function finished () { echo finished at: $(date); while true; do printf "\a"; sleep 1; done; } wait %2; finished
lvs | grep -c -e $stem; # to see progress
We ran into a strange issue when using
docker-lvm-plugin
on a host with over a hundredlvm
volumes. Once rebooted, if one would attempt to inspect a volume (or perform any operation that would internally calllvdisplayGrep
) plugin will be rendered unusable. When looking at the process tree we saw some[lvdisplay] <defunct>
zombies (and quite a few of them). Our theory is that iflvdisplay
is called very "early" after host is rebooted it fails or does not produce the outputlvdisplayGrep
expects and this situation is not being handled properly. As a result plugin process becomes stuck in the named function that is being call within a lock and therefore no other operations will succeed after that.@mortya has a script that shows how to reproduce this issue.
This probably could've been handled by revising error handling in
lvdisplayGrep
but I found it to be easier to get rid of thegrep
call along with the pipe and do "grepping" manually withstrings.Contains
. Code is simpler and we were not able to reproduce the issue with the given patch. Please note I used "dumb"strings.Split
which appears to be good enough for the given task instead of a scanner frombufio
as it's more succinct this way. If you prefer it the other way around please let me know.