bpaquet / node-logstash

Simple logstash implmentation in nodejs : file log collection, sent with zeromq
Other
517 stars 141 forks source link

Duplication of messages when using config_dir option #117

Closed sentinelleader closed 8 years ago

sentinelleader commented 8 years ago

Hey,

I've been using node-logstash for quite a while and its amazing :) All these times, i was using with a single config file. Now i had to use multiple configs, so ive started using the config_dir option. But ever since i start using the option, i'm seeing duplication of logs. The no. of duplicates exactly matches to no. of config files present in the config directory.

Each config has separate input files belonging to separate folders to read, they are not even of the same folder. lsof shows that each child process is reading other input files too :(

node 13516 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog SignalSen 13516 13519 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13520 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13521 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13522 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13523 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog

Each child process is accessing the same files. And it repeats for all other input files

I've now huge amounts of data, and since its getting multiplied like 6x, im filling like 500GB per day :(

bpaquet commented 8 years ago

On Thu, Oct 29, 2015 at 7:17 PM, Guardian Sentinel <notifications@github.com

wrote:

Hey,

I've been using node-logstash for quite a while and its amazing :) All these times, i was using with a single config file. Now i had to use multiple configs, so ive started using the config_dir option. But ever since i start using the option, i'm seeing duplication of logs. The no. of duplicates exactly matches to no. of config files present in the config directory.

Each config has separate input files belonging to separate folders to read, they are not even of the same folder. lsof shows that each child process is reading other input files too :(

node 13516 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog SignalSen 13516 13519 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13520 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13521 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13522 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog node 13516 13523 root 28r REG 202,96 8545741 5508505 /mnt/log/xxx/server-events-2015-10-29.jslog

Each child process is accessing the same files. And it repeats for all other input files

What do you mean by child process ? Node-logstash does not uses multiple processes. May be you can show multiple threads instanciated by node it self, by no multiple processses.

I've now huge amounts of data, and since its getting multiplied like 6x, im filling like 500GB per day :(

Can you check you have no * which are covering the same file ? Can you provide an extract from your config : grep input * ?

Bertrand

— Reply to this email directly or view it on GitHub https://github.com/bpaquet/node-logstash/issues/117.

sentinelleader commented 8 years ago

Sry, misread the lsof output. They where indeed ThreadID's. Below are the input source details. Though i use wildcard for the files, their parent folders are different in each input source

app-backend-exceptions.json:input://file:///mnt/log/logger/backend/exceptions/exceptions.jslog?type=logger app-backend-latency.json:input://file:///mnt/log/logger/backend/latency/latency.jslog?type=logger app-backend-server-events.json:input://file:///mnt/log/logger/backend/server-events/server-events.jslog?type=logger app-backend-streaming-events.json:input://file:///mnt/log/logger/backend/streaming-events/streaming-events.jslog?type=logger app-backend-task-runner.json:input://file:///mnt/log/logger/backend/task-runner/task-runner.jslog?type=logger app-frontend-combined.json:input://file:///mnt/log/logger/frontend/combined/combined.jslog?type=logger

bpaquet commented 8 years ago

I do not see wildcards on this config :(

sentinelleader commented 8 years ago

yikes, looks like markdown removed it :(

basically it's *.jslog?type=logger for each input

bpaquet commented 8 years ago

Hi,

When I edit your post, I see that

   app-frontend-combined.json:**input://file:///mnt/log/logger/frontend/combined/**combined*.jslog?type=logger

What is your exact config ? Do you use double *? Can you post your config in gist or pastebin ?

sentinelleader commented 8 years ago

Yes gist would be perfect, https://gist.github.com/sentinelleader/3384049825a5095164d2

bpaquet commented 8 years ago

Hi,

It seems your filters and outputs lines are duplicated. You have two solutions

Regards,

Bertrand

On Sun, Nov 22, 2015 at 2:43 PM, Guardian Sentinel <notifications@github.com

wrote:

Yes gist would be perfect, https://gist.github.com/sentinelleader/3384049825a5095164d2

— Reply to this email directly or view it on GitHub https://github.com/bpaquet/node-logstash/issues/117#issuecomment-158760644 .