CAIDA / bgpstream

BGP measurement analysis for the masses
GNU General Public License v2.0
109 stars 43 forks source link

how to use the function of real-time for local file with bgpreader or bgpcorsaro #77

Closed andy19910403 closed 5 years ago

andy19910403 commented 5 years ago

Hi I am smaller beginner~ First of all thanks for this excellent project in the world! I am trying real-time for local file but it can't work command line like this: bgpcorsaro -d singlefile -o upd-file,rib.20181126.0000 -x"pfxmonitor -L ./test.txt -n 5 -M" -w 1543161600, -l -O ./%X.txt

The "rib.20181126.0000" is rib local file The "test.txt" just saved prefix of "0.0.0.0/8"

does something option should I adding?

thanks in advance for your help

alistairking commented 5 years ago

Hi there,

Thanks for using BGPStream.

Just to make sure we're on the same page, I understand that you saved 0.0.0.0/8 into the test.txt before running BGPCorsaro, is that correct? If this is the case, then the pfxmonitor plugin should output information about prefixes in 0.0.0.0/8 that are contained in your local RIB dump. Now, you said that you are trying "real-time for local file", but since this is a single local file, it doesn't make sense to do real-time monitoring. BGPCorsaro will process the file once and then (I believe) it will periodically check the file to see if it has been updated. Also, are you sure there are prefixes from 0.0.0.0/8 in the file? I'm not sure that 0/8 is actually announced on the Internet.

alistairking commented 5 years ago

Are you still having trouble with this? I'm going to close the issue for now, but please feel free to re-open it if you want to follow up.

kurtrwall commented 5 years ago

I'd actually like to have a realtime monitoring of a single file as well. My situation is that I have BIRD dumping to a MRT file constantly, and I'd like to use this library to pull the new entries, parse it, and output to a message bus for consumption. Is it possible to do live monitoring of that file?

kurtrwall commented 5 years ago

@alistairking btw thanks for your contributions to this library and attention. It seems like people drop an issue and then ghost on you. I promise to not to.

The MRT dump file I'm dealing with is constantly growing. You said BGPCorsaro would process the file to completion, then check again periodically. Is that controlled via the interval constants set here? If that's the case, then it does seem to try and check the file for additional data, but there's a conditional here that makes me think if it's the same file, it won't ever update. Not sure why you'd want to ensure the headers of the files are not the same before proceeding there. It seems like that's what kicks off adding data to the input queue. Are there any side effects to removing the header check?

Thanks again.

alistairking commented 5 years ago

Hey @kurtrwall, thanks for the messages, and my apologies for not getting back to your original message.

So the usual way this is used is that you periodically create a new dump file in the same place (normally by updating a symlink to the latest dump file). In this use case, the header check is a cheap way to block until we see new data is present in the file.

We don't currently support something like tail -f unfortunately.

kurtrwall commented 5 years ago

We don't currently support something like tail -f unfortunately.

That's too bad, but your workaround sounds doable. Do you know what might be involved in supporting that? I might consider giving it a try.

periodically create a new dump file in the same place (normally by updating a symlink to the latest dump file). In this use case, the header check is a cheap way to block until we see new data is present in the file.

Okay, just so I understand right: once the reader gets to the end of the file, it will just keep hitting this function until it's been longer than the interval and the file header changed (symlink updated). ~Disregarding the interval constants in that file, are you aware of any embedded latency between the reader reaching the end of the file and when it checks it again?~ Answered in comment below.

I'm trying to achieve as close to realtime as possible so I'm considering rebuilding the library with those intervals set to 0. I'm considering possibly removing the header check so that it always looks for new data in the file regardless of header change. I'm interested in seeing the effects of these changes.

kurtrwall commented 5 years ago

Disregarding the interval constants in that file, are you aware of any embedded latency between the reader reaching the end of the file and when it checks it again?

Found the answer in the constant DATASOURCE_BLOCKING_MIN_WAIT. Looks like the minimum is 30 seconds. In my experiment, I may reduce that as well.

kurtrwall commented 5 years ago

After some testing, it seems that reducing those constants' values virtually behaves in a realtime sense. I am seeing some batching, but it's probably mostly due to BIRD's handling of how that dump file is populated.

Maybe these intervals can be passed into the stream object on creation so this effect can be achieved without rebuilding the library? What are your thoughts?

kurtrwall commented 5 years ago

Huge caveat that I didn't mention, the position in the file is lost with every data grab, so all of the file is output which is less than ideal. Currently testing other options.