Open mrjones-plip opened 6 years ago
FYI @unix1
Update: i've been running this set up for 2 hours. I started with 0 files and now I have 9138 that dnsauth hasn't processed. Faux-Logs has written 203,910 files in those two hours.
This is not good. The problem could either be in cleaning up, or somewhere prior. If you, by any chance, still have the output and files (unlikely given this was few days ago), a couple of additional pieces to observe would be:
Processed dump ...
), that means the problem occurred during cleaning upFailed to clean up ...
it should also have an error string possibly giving clues as to whyIf none of the above are observed, it is likely that there might be a bug in the timing/way the files are picked up. It certainly should be looked into.
Tomorrow morning I'll start clean, follow my steps exactly (above) and post the remaining un-ingested logs and dnsauth error log in hopes that it illuminates the problem. stay tuned!
Ack! I forgot to do this today. I'll grab it tomorrow or Wed and report back.
OK! I'm on this now. I've recorded how I set up my test, but it was basically, stop dnsauth, empty syslog, empty dnsauth log dir, cat config for dnsauth, cat 2 configs for Faux-Logs, start dnsauth, start Faux-Logs. See issue.26.preflight.zip. I'll stop this after about an hour and send my zipped syslog and the last line from Faux-Logs to tell you aprox how many files it wrote. As well, if it's not too big, I'll send the contents of the dnsauth log dir. In time since I've started this about 5 min ago there's already been about 15k files written by Faux-Logs (and presumably ingested and deleted by dnsauth ; )
Stay tuned!
Cool - we didn't need to wait very long for the problem to manifest. The setup above ran for about 15-20 min and Faux-Logs thinks it wrote 1M+ lines across 40k- files:
Files Written: 39,500, Lines Written: 1,185,000
After stopping Faux-Logs and waiting 2 full minutes for dnsauth to iterate a few more times, and then stopping dnsauth, there were exactly 18k files that didn't get deleted:
root@dnsauth:# ls /home/dnsauth/|wc -l
1800
I then archived the remaining log files and syslog:
cp -r /home/dnsauth/ remaining.logfiles.issue.26
zip -r remaining.logfiles.issue.26.zip remaining.logfiles.issue.26/
cp /var/log/syslog syslog.issue.26
zip syslog.issue.26.zip syslog.issue.26
I've attached these files here: syslog.issue.26.zip remaining.logfiles.issue.26.zip
@unix1 - I think this might be a red herring. Per my email the other day, I created a prototypical pcap file (proto.log.gz) and then wrote a bash script to generate a valid file name and copy it into the DNSAuth directory:
#!/bin/bash
TARGET_DIR="/home/dnsauth/"
SOURCE_FILE="proto.log.gz"
echo "will put a lot of log files up in your grill 'til you stop me..."
while true; do
sleep 0.05 &
DAY=`date '+%d'`
MONTH=`date '+%m'`
YEAR=`date '+%Y'`
HR=`date '+%H'`
MN=`date '+%M'`
NEW_FILE="$RANDOM$RANDOM-SZC_mon-01.zrh.woodynet.net_$YEAR-$MONTH-$DAY.$HR-$MN.dmp.gz"
`cp ${SOURCE_FILE} ${TARGET_DIR}${NEW_FILE}`
wait
done
after running it for 20 min or so, I was unable to reproduce the problem.
If you agree, I think this is endemic to the way Faux-Logs is written and we should close this issue as "can't reproduce". Lemme know!
Update - Lemme update the bash script to have an inner loop that creates 100 files per outer loop - one per POP that Faux-Logs does. In case the number of POPs has anything to do with it. Stay tuned!
ok, doesn't seem to be reproducing with a LOT of POPs:
#!/bin/bash
TARGET_DIR="/home/dnsauth/"
SOURCE_FILE="proto.log.gz"
# mon-01.xyz.foonet.net_2017-10-17.17-07.dmp.gz
echo "will put a lot of log files up in your grill 'til you stop me..."
declare -a POPS=("acc" "akl" "ams" "ark" "atl" "ber" "bey" "bjl" "bom" "bur" "bze" "cai" "cdg" "cmb" "coo" "cor" "cpt" "dar" "dfw" "dub" "dur" "dxb" "edi" "ewr" "eze" "fih" "fra" "gbe" "gnd" "gye" "gza" "iad" "icn" "jax" "jkt" "jnb" "kbp" "kgl" "kin" "kla" "klu" "ktm" "kye" "lad" "lba" "lbv" "lfw" "lga" "los" "lpb" "lys" "man" "mba" "mex" "mgm" "mia" "mke" "mnl" "nbe" "nrt" "ord" "pao" "pap" "pcb" "pdx" "per" "phx" "pnh" "pos" "prg" "ric" "rno" "rob" "scl" "sdb" "sea" "sfo" "sgu" "sin" "sjo" "slu" "sna" "sof" "syd" "szg" "tgu" "tnr" "tpa" "tun" "vli" "wdh" "wlg" "yhz" "yow" "yul" "ywg" "yxe" "yyc" "yyz" "zrh")
while true; do
sleep 1.0 &
DAY=`date '+%d'`
MONTH=`date '+%m'`
YEAR=`date '+%Y'`
HR=`date '+%H'`
MN=`date '+%M'`
for POP in "${POPS[@]}"
do
NEW_FILE="$RANDOM$RANDOM-SZC_mon-01.$POP.woodynet.net_$YEAR-$MONTH-$DAY.$HR-$MN.dmp.gz"
`cp ${SOURCE_FILE} ${TARGET_DIR}${NEW_FILE}`
done
#echo "cp ${SOURCE_FILE} ${TARGET_DIR}${NEW_FILE}"
wait # for sleep
done
I'll let this run for ~1 hour and report back.
@unix1 - k, I'm gonna close this. I let the above bash script run for an hour and there wasn't the overrun of files like with Faux-Logs. Sorry for the false alarm! On the off chance that this breaks in production, I'll re-open this ticket!
Oh yeah, just testing now a bit more and changing sleep 1.0 &
to sleep 4.0 &
I actualy see the file count via ls -hl /home/dnsauth/|wc -l
go to zero for sec between the bash script sleep and dnsauth's 30 second ingestion loop.
@unix1 - bad news! I finally got around to pushing the new version of DNSAuth to production and this problem is present. Reopening and ping me about access to prod so we can troubleshoot.
@unix1 - oh wait! I see now that all the files it's skipping are only 20 bytes compressed and 0 bytes when uncompressed. However, I feel like this is a bug! What DNSAuth should be doing is still processing the file and deleting it, possibly noting that it had no lines. Do we want to track this in this ticket or a new one?
@Ths2-9Y-LqJt6 I'd suggest a new ticket would be better for handling empty files. The original problem described in this issue seemed a lot more substantial.
I’d prefer DNSAuth moved files, that cause an issue (unexpected data etc), to another directory for subsequent analysis.
This’ll likely require a supervisory component to protect against disk resource exhaustion. A cheap version may be to move files while the directory is not empty or contains less than X files. I assume that we wouldn’t encounter different errors at the same time so one may be enough.
On Dec 4, 2018, at 7:25 AM, unix1 notifications@github.com wrote:
@Ths2-9Y-LqJt6 I'd suggest a new ticket would be better for handling empty files. The original problem described in this issue seemed a lot more substantial.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
I'm using the just released 1.2.0 and have set up DNSAuth to delete my files. Here's my
dnsauth.toml
:I was using Faux-Logs to push a couple hundred log files into
/home/dnsauth
every minute. This worked great! Every time I saw DNSAuth run, I would see the file count drop vials -ahl /home/dnsauth/|wc -l
.However, letting it run overnight, I see that there's a lot of files in the directory:
This means that DNSAuth isn't deleting all the files.
Steps to reproduce:
dnsauth.toml
you have:config.php
:config2.php
. You may have to add more entries - I have 104 entries in my config2 file:php -f multi-file.gzip.php /home/dnsauth/ 30 610000
expected: dnsauth ingests all the files to influxdb and deletes all the files. at any given point there should only be a couple hundred files in the
watch-dir
between when dnsauth runs and the files are being created. actual: after several hours there over a hundred thousand files in thewatch-dir
Update - changing the
dnsauth.toml
to usecleanup-action = "move"
, I end up with an exact duplicate of of index directory as in mycleanup-dir
. We can re-title this bug as needed.