CAIDA / pybgpstream

Python bindings for BGPStream
https://bgpstream.caida.org
BSD 2-Clause "Simplified" License
28 stars 22 forks source link

Drop in messages received from V1 #10

Open LukeFenton opened 5 years ago

LukeFenton commented 5 years ago

Hi,

Something to note, I have previously been using pyBGPStream to get all update messages from all of the Routeviews collectors.

More recently I have noticed a large drop in messages from 25m per hour to around 3m per hour.

Does this have anything to do with the recent move to version 2 ?

Thanks

L

alistairking commented 5 years ago

Can you clarify if you are noticing a drop compared to V1, or if you have noticed a drop recently in V2?

LukeFenton commented 5 years ago

I am noticing a drop compared to V2.

I was receiving 25m + per hour on V1 up until the 20th Feb and on v1 am only receiving about 3m per hour.

When using v2 I am seeing 25m per hour once again.

alistairking commented 5 years ago

I checked our continuous processing (which runs V1) and things seem to have been pretty stable for us. Any chance you can narrow the difference down to certain collectors?

LukeFenton commented 5 years ago

I don't currently have a break down from February. Though I do have a 15 minute chunk from 28th January. I don't know if it is useful but shows a break down of collectors at that time. During this time we were getting peaks of 25m per hour.

I can see that v2 is pulling from more locations and don't know whether somethings has changed in v1 since

    08-45 - 09:00. 28/01/19 08-45 - 09:00. 28/01/19
    Version 1 Version 2
       
       
route-views2   547,817 547,817
route-views3     334,852
route-views4   399,996 399,996
route-views6   50,623 50,623
route-views.eqix     195,118
route-views.isc   87,520 87,520
route-views.kixp   206 206
route-views.jinx   27,543 27,543
route-views.linx     2,576,186
route-views.telxatl   91,227 91,227
route-views.wide   13,936 13,936
route-views.sydney   95,496 95,496
route-views.saopaulo   210,833 210,833
route-views.nwax   50,929 50,929
route-views.perth   552 552
route-views.sg   134,977 134,977
route-views.sfmix     36,629
route-views.soxrs   1,928 1,928
route-views.chicago     127,786
route-views.napafrica     152,593
route-views.flix     255,555
route-views.chile     18,974
Other - r-v.wide      
    1,713,583 5,411,276
LukeFenton commented 5 years ago

I will get a break down of number of messages per collector for an hour before and after the drop in messages. I can provide this breakdown when i get the data.

LukeFenton commented 5 years ago

@alistairking I have compered the result of a BGPreader, running from 20th Feb 01:00-02:00, to the result of v1 PyBGPStream output using this code.

#!/usr/bin/env python

from _pybgpstream import BGPStream, BGPRecord, BGPElem

# Create a new bgpstream instance and a reusable bgprecord instance
stream = BGPStream()
rec = BGPRecord()

message_count = 0

# Consider this time interval:
# Sat Aug  1 08:20:11 UTC 2015
stream.add_filter('project','routeviews')
stream.add_filter('record-type', 'updates')

stream.add_interval_filter(1550624400,1550628000)

# Start the stream
stream.start()

# Get next record
while(stream.get_next_record(rec)):
    # Print the record information only if it is not a valid record
    if rec.status != "valid":
        print rec.project, rec.collector, rec.type, rec.time, rec.status
    else:
        elem = rec.get_next_elem()
        while(elem):
            #print rec.project, rec.collector, rec.type, rec.time, rec.status,
            #print elem.type, elem.peer_address, elem.peer_asn, elem.fields
            message_count += 1
            elem = rec.get_next_elem()

print 'number of messages = ', message_count

The result of the python was number of messages = 3,201,436

However I ran the same timeframe with bgpreader -

bgpreader -p routeviews - t updates -w 1550624400,1550628000 | wc -l 

I got a result of 12,156,311

This seems to be something weird going on with PyBGPStream maybe?

Thanks

L

alistairking commented 5 years ago

Thanks a lot for doing that. What version of BGPStream are you using? We released 1.2.2 (https://github.com/CAIDA/bgpstream/releases/tag/v1.2.2) a few weeks back which fixes a bug in PyBGPStream -- I wonder if that is the same thing you're running into?

LukeFenton commented 5 years ago

I can get the version and try this tomorrow and see whether this makes a difference. Is that v1.2.2 of libbgpstream?

alistairking commented 5 years ago

It's both libbgpstream and pybgpstream. If you go to that link you'll find tarballs for both packages.

LukeFenton commented 5 years ago

Oh okay I’ll have a look tomorrow and get back to you. Thanks

LukeFenton commented 5 years ago

Hi @alistairking, I have updated to 1.2.2 and run the same code and have gotten the same results.

BGPReader - 12,156,311
PyBGPStream - 3,201,508
LukeFenton commented 5 years ago

Turns out an additional in between the -t flag caused the discrepancy between pybgstream and bgpreader.

This doesn't explain the drop in messages from 05:00 on Feb 20th

LukeFenton commented 5 years ago

Okay so after investigation, I have pulled data using pybgpreader and documented the number of update messages per hour for 06/03/2019 and 30/10/2018. As you can see there is a very large difference between the number of messages.

Is there anything on your end which could explain this drop?

Thanks

L RVMessageComparion.xlsx

alistairking commented 5 years ago

@LukeFenton we haven't forgotten about this and are still looking into the problem. Sorry for the delay.

alistairking commented 5 years ago

A small update.

It seems that this data has the "Extended Time" (ET) MRT format (https://tools.ietf.org/html/rfc6396#section-3), which the version of libbgpdump that we're using in v1 does not support. I'll look into how hard it will be to add/backport support for this.

Here is a bgpreader config that can be used to reproduce this problem:

bgpreader -d singlefile -o upd-file,http://bgp-archive.caida.org/routeviews/route-views3/updates/2019/01/28/routeviews.route-views3.updates.1548665100.bz2
alistairking commented 5 years ago

Here's the commit that added ET support to libbgpdump: https://bitbucket.org/ripencc/bgpdump/commits/a3bb2234bbba62f09209238e9c32bde4d08647a1

We will need to backport this to the version embedded in libbgpstream: https://github.com/CAIDA/bgpstream/tree/master/lib/bgpdump

@LukeFenton (and others interested) I'm not sure we'll have time to work on this in the very near future, so if you want to take a stab at a PR, that would be much appreciated.

LukeFenton commented 5 years ago

Hi @alistairking,

Thanks very much for the reply, is this something which should be transferred into an issue in libbgpstream?

Luke

alistairking commented 5 years ago

No, this is an issue that only affects V1 (this repo) and not V2 (the libbgpstream repo).

The repo structure is a bit confusing because we split the libbgpstream and pybgpstream repos for V2.