InternetHealthReport / as-hegemony

Compute network dependencies
3 stars 6 forks source link

Undefined symbol error #2

Closed sim-azan closed 3 years ago

sim-azan commented 3 years ago

Hi, I have been trying to run the Docker-compose.yml, but had to change the librdkafka to the latest v1.6.1 and also for confluent-kafka to v1.6.0 However I keep getting the same error, no matter if I build the packages from source or install from apt-get. The errors reads like below: File "/usr/local/lib/python3.7/dist-packages/confluent_kafka/deserializing_consumer.py", line 19, in bgpstream-update-rrc10_1 | from confluent_kafka.cimpl import Consumer as _ConsumerImpl bgpstream-update-rrc10_1 | ImportError: /usr/local/lib/python3.7/dist-packages/confluent_kafka/cimpl.cpython-37m-x86_64-linux-gnu.so: undefined symbol: rd_kafka_consumer_group_metadata_write

Could it be that the code only works with older librdkafka versions? If so could you please mention which confluent-kafka version and librdkafka is this code tested on? Thank you

romain-fontugne commented 3 years ago

Hi, Yes, it seems to be some incompatibilities with kafka. I don't use the docker image, I run the python code directly as mentioned in the readme. Have you tried that?

romain-fontugne commented 3 years ago

The version of kafka we use is 2.2.0

sim-azan commented 3 years ago

Hi, Yes, it seems to be some incompatibilities with kafka. I don't use the docker image, I run the python code directly as mentioned in the readme. Have you tried that?

I tried running the code for eg for produce_bgpdata.py However get the same error as the docker image as mentioned in issue above. I use the latest libdrkafka + confluentkafka packages built from source. I also assumed this code was tested on version 1.5.0 for libdrkafka and also tried to install that but same errors result. If you could please mention what version of libdrkafka + confluentkafka are being used by you? Thanks

romain-fontugne commented 3 years ago

I use confluent community software 5.2.1 which comes with kafka 2.2.0-cp2. I have run that code on several versions of libdrkafka.

But I think the problem comes from your install. The error happens in confluent_kafka/deserializing_consumer.py. By the way to run the code you need bgp data in your kafka. The code we are using to fetch bgp data and push it to kafka is here: https://github.com/InternetHealthReport/kafka-toolbox/blob/master/bgp/producers/bgpstream2.py

I should add that to the readme!

sim-azan commented 3 years ago

Hi thanks for this. I am struggling trying to understand the run order of the code. My goal is to fetch AS hegemony. So I start with produce_bgpdata to get data for a certain collector and timestamp. How should I proceed from there to get AS hegemony?

romain-fontugne commented 3 years ago

If you are only interested in AS hegemony results, and you are not planning to change the way it is computed, then you can get our results via this API: https://ihr.iijlab.net/ihr/en-us/api

Here is an example of AS hegemony for AS2497 (IIJ): https://ihr.iijlab.net/ihr/api/hegemony/?timebin=2021-03-01T00%3A00&af=4&originasn=2497

sim-azan commented 3 years ago

If you are only interested in AS hegemony results, and you are not planning to change the way it is computed, then you can get our results via this API: https://ihr.iijlab.net/ihr/en-us/api

Here is an example of AS hegemony for AS2497 (IIJ): https://ihr.iijlab.net/ihr/api/hegemony/?timebin=2021-03-01T00%3A00&af=4&originasn=2497

Thanks Romain. I am an avid user of the API, however I understand that the API does not use any bgp data previous to 2015. I am not sure why this is. I want hegemony scores from much older dates like early 2000 and using the API I could not get any scores. So what would you suggest me to do? Thanks

romain-fontugne commented 3 years ago

Thanks, but I think if I could find a way to get hegemony values per originASN for the entire week every month for 2010 and after it would align well with my study. I was wondering how I could go about this. The new repo 'as-hegemony' or the older 'as-hash' ? For AS-hegemony repo I managed to run the kakfa and the produce_bgpdata.py to get data into my kafka cluster but I am clueless on how to proceed after that. I am thinking I need to run the code in a particular order for the producer and consumer to work. Thanks for your help.

Up to this month, we always used only four collectors: rrc00, rrc10, route-views2, and route-views.linx. So for consistency you wanna get data from these four collectors. For each day we get one RIB at midnight and then updates until the end of the day.

Then you can use the ihr/daily-run.py script to compute AS hegemony. Here is an example: python3 ihr/daily-run.py all 2021-02-25T00:00

that would compute AS hegemony for every 15min on feb 25th. This command should be executed from the folder where you have your produce_*.py scirpts. All results are in the ihr_hegemony topic. Naming are a bit different from the website. scope means originasn if you use the 'global graph' it is refered as scope="-1" (but i think there is some problem with this code and the global graph, that's something I'm investigating now)

One more thing make sure the topics are created by the scripts, don't create them manually, the code assumes a certain number of partitions.

sim-azan commented 3 years ago

Thanks, but I think if I could find a way to get hegemony values per originASN for the entire week every month for 2010 and after it would align well with my study. I was wondering how I could go about this. The new repo 'as-hegemony' or the older 'as-hash' ? For AS-hegemony repo I managed to run the kakfa and the produce_bgpdata.py to get data into my kafka cluster but I am clueless on how to proceed after that. I am thinking I need to run the code in a particular order for the producer and consumer to work. Thanks for your help.

Up to this month, we always used only four collectors: rrc00, rrc10, route-views2, and route-views.linx. So for consistency you wanna get data from these four collectors. For each day we get one RIB at midnight and then updates until the end of the day.

Then you can use the ihr/daily-run.py script to compute AS hegemony. Here is an example: python3 ihr/daily-run.py all 2021-02-25T00:00

that would compute AS hegemony for every 15min on feb 25th. This command should be executed from the folder where you have your produce_*.py scirpts. All results are in the ihr_hegemony topic. Naming are a bit different from the website. scope means originasn if you use the 'global graph' it is refered as scope="-1" (but i think there is some problem with this code and the global graph, that's something I'm investigating now)

One more thing make sure the topics are created by the scripts, don't create them manually, the code assumes a certain number of partitions.

Thanks for the details. So this is how I went about this:

All config.json is same as yours.

Get the bgpdata first for a collector and date:

python3 produce_bgpdata.py -t ribs --collector rrc00 --startTime 2010-01-01T00:00:00 --endTime 2010-01-01T01:00:00 python3 produce_bgpdata.py -t updates --collector rrc00 --startTime 2010-01-01T00:00:00 --endTime 2010-01-01T23:59:00

This creates the kafka topics for the above collectors and type. kafka-topics.sh --list --zookeeper localhost:2181 __consumer_offsets ihr_bgp_rrc00_ribs ihr_bgp_rrc00_updates

Next I run the daily-run.py as: python3 -m ihr.daily-run.py all 2010-01-01T00:00

However get an error at line 48:

Selecting collectors with updated data... Traceback (most recent call last): File "ihr/daily-run.py", line 102, in <module> selected_collectors = select_collectors(start_time) File "ihr/daily-run.py", line 48, in select_collectors partition = TopicPartition(topic, 0, start_threshold.timestamp*1000) TypeError: unsupported operand type(s) for *: 'method' and 'int'

Also I have 4 kafka brokers running on ports localhost:9092,9093,9094,9095. Maybe that is relevant ?

romain-fontugne commented 3 years ago

Ah, sorry I just fixed that problem. The arrow module changed recently. Please pull the latest code and try again, I think you're very close to get it working :)

sim-azan commented 3 years ago

Ah, sorry I just fixed that problem. The arrow module changed recently. Please pull the latest code and try again, I think you're very close to get it working :)

Thanks, I actually tried the same fix as you but end up with another error :D

start: 2010-01-01T00:00:00 end: 2010-01-02T00:00:00 Selecting collectors with updated data... Traceback (most recent call last): File "ihr/daily-run.py", line 101, in selected_collectors = select_collectors(start_time) File "ihr/daily-run.py", line 52, in select_collectors time_offset = consumer.offsets_for_times( [partition] ) cimpl.KafkaException: KafkaError{code=_UNKNOWN_PARTITION,val=-190,str="Failed to get offsets: Local: Unknown partition"}

sim-azan commented 3 years ago

Ah, sorry I just fixed that problem. The arrow module changed recently. Please pull the latest code and try again, I think you're very close to get it working :)

Thanks, I actually tried the same fix as you but end up with another error :D

start: 2010-01-01T00:00:00 end: 2010-01-02T00:00:00 Selecting collectors with updated data... Traceback (most recent call last): File "ihr/daily-run.py", line 101, in selected_collectors = select_collectors(start_time) File "ihr/daily-run.py", line 52, in select_collectors time_offset = consumer.offsets_for_times( [partition] ) cimpl.KafkaException: KafkaError{code=_UNKNOWN_PARTITION,val=-190,str="Failed to get offsets: Local: Unknown partition"}

Seems like this error happens since I do not have all collectors dumps in my kafka. I changed to only 1 collector for which I have data i.e. 'rrc00' The error goes away. Now another comes up in produce_bgpdata.py related to origin_as. File "/home/sim/Downloads/Pycharm/as-hegemony-main/hege/bgpatom/bgpatom_peer.py", line 67, in update_announcement_message origin_asn = non_prepended_aspath[-1] IndexError: list index out of range

I will keep debugging and see how it goes from here !

romain-fontugne commented 3 years ago

This means you have an empty AS path, which is not possible. Are you sure you have data in your BGP topics (ihr_bgp_rrc00_ribs, ihr_bgp_rrc00_updates)? Using this script https://github.com/InternetHealthReport/kafka-toolbox/blob/master/handy/tail.py you can check the last message in the topcis. E.g. python3 tail.py -s localhost:9092 -t ihr_bgp_rrc00_updates

sim-azan commented 3 years ago

This means you have an empty AS path, which is not possible. Are you sure you have data in your BGP topics (ihr_bgp_rrc00_ribs, ihr_bgp_rrc00_updates)? Using this script https://github.com/InternetHealthReport/kafka-toolbox/blob/master/handy/tail.py you can check the last message in the topcis. E.g. python3 tail.py -s localhost:9092 -t ihr_bgp_rrc00_updates

I do have data in the updates: python3 tail.py -s localhost:9092 -t ihr_bgp_rrc00_updates 0 521088 521087 {'topic': 'ihr_bgp_rrc00_updates', 'partition': 0, 'key': None, 'timestamp': (1, 1262390341000), 'headers': None, 'value': {'rec': {'project': 'ris', 'collector': 'rrc00', 'type': 'update', 'dump_time': 1262390100, 'time': 1262390341.0, 'status': 'unknown', 'dump_position': 'middle'}, 'elements': []}}

For ribs: 0 16239 16238

romain-fontugne commented 3 years ago

Hmmm, can you try a different day? Something a bit more recent,

On Thu, Mar 11, 2021, 10:38 sim-azan @.***> wrote:

This means you have an empty AS path, which is not possible. Are you sure you have data in your BGP topics (ihr_bgp_rrc00_ribs, ihr_bgp_rrc00_updates)? Using this script https://github.com/InternetHealthReport/kafka-toolbox/blob/master/handy/tail.py you can check the last message in the topcis. E.g. python3 tail.py -s localhost:9092 -t ihr_bgp_rrc00_updates

I do have data in the updates: python3 tail.py -s localhost:9092 -t ihr_bgp_rrc00_updates 0 521088 521087 {'topic': 'ihr_bgp_rrc00_updates', 'partition': 0, 'key': None, 'timestamp': (1, 1262390341000), 'headers': None, 'value': {'rec': {'project': 'ris', 'collector': 'rrc00', 'type': 'update', 'dump_time': 1262390100, 'time': 1262390341.0, 'status': 'unknown', 'dump_position': 'middle'}, 'elements': []}}

For ribs: 0 16239 16238

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/InternetHealthReport/as-hegemony/issues/2#issuecomment-796358437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKPNXMQNBWJAGSNQMUXBU3TDANJ3ANCNFSM4YS56SAQ .

sim-azan commented 3 years ago

Thanks for you help Romain.