InternetHealthReport / internet-yellow-pages

A knowledge graph for the Internet
https://iyp.iijlab.net
GNU General Public License v3.0
43 stars 18 forks source link

RoVista RoV detection #83

Closed romain-fontugne closed 9 months ago

romain-fontugne commented 11 months ago

Get RoVista data from their API: https://api.rovista.netsecurelab.org/rovista/api/overview?offset=0&sortBy=rank&sortOrder=asc&count=1000&search=

This is paginated we should query it multiple times. All ASes with a ratio higher than 0.5 are categorized as 'Validating RPKI ROV' others are categorized 'Not Validating RPKI ROV'

So the added relationships are:

(:AS)-[:CATEGORIZED {ratio: 1}]->(:Tag {label: 'Validating RPKI ROV')

and:

(:AS)-[:CATEGORIZED {ratio: 0}]->(:Tag {label: 'Not Validating RPKI ROV')
MAVRICK-1 commented 9 months ago

Below is a rough code snippet that fetches data from the RoVista API, categorizes ASes based on their ratio, and creates relationships property accordingly

import requests
from neo4j import GraphDatabase

# Neo4j connection parameters
uri = "neo4j://localhost:7687"
username = "neo4j"
password = "password"

# Function to categorize ASes based on ratio and create relationships in Neo4j
def categorize_as(asn, ratio):
    if ratio > 0.5:
        query="""
            match (as:AS{asn:$asn})-[c:CATEGORIZED]-(tag:Tag{label:"Validating RPKI ROV"})
            where c.ratio is null
            SET c.ratio=$ratio

        """
    else:
        query="""
            match (as:AS{asn:$asn}))-[c:CATEGORIZED]-(tag:Tag{label:"Not Validating RPKI ROV"})
            where c.ratio is null
            SET c.ratio=$ratio
        """
    with driver.session() as session:
        session.run(query, asn=asn, ratio=ratio)

# Fetch data from RoVista API and categorize ASes
def fetch_and_categorize():
    url = "https://api.rovista.netsecurelab.org/rovista/api/overview"
    params = {
        'offset': 0,
        'sortBy': 'rank',
        'sortOrder': 'asc',
        'count': 1000,
        'search': ''
    }
    response = requests.get(url, params=params)
    data = response.json().get('data', [])
    for entry in data:
        asn = entry['asn']
        ratio = entry['ratio']
        categorize_as(asn, ratio)

# Connect to Neo4j
driver = GraphDatabase.driver(uri, auth=(username, password))

# Call the function to fetch and categorize ASes
fetch_and_categorize()

# Close the Neo4j driver
driver.close()

@romain-fontugne @m-appel is my approach right? can u assign me this ?

m-appel commented 9 months ago

Hey, please take a look at this crawler for a simple example of how we usually write crawlers. There are already existing functions to create nodes and links in batches.

We are going to write/update some guidelines on how to write new crawlers soon™ where this will be described on more detail.

Your categorization is correct, but note that the example link only gives a page of 1000 entries, i.e., you will need to follow the pagination and fetch the remaining stuff as well.

MAVRICK-1 commented 9 months ago

Hey, please take a look at this crawler for a simple example of how we usually write crawlers. There are already existing functions to create nodes and links in batches.

We are going to write/update some guidelines on how to write new crawlers soon™ where this will be described on more detail.

Your categorization is correct, but note that the example link only gives a page of 1000 entries, i.e., you will need to follow the pagination and fetch the remaining stuff as well.

There was no function to set the property of relationship , So I created One https://github.com/MAVRICK-1/internet-yellow-pages/blob/RovDetection%2383/iyp/__init__.py#L562-L581