InternetHealthReport / internet-yellow-pages

A knowledge graph for the Internet
https://iyp.iijlab.net
GNU General Public License v3.0
43 stars 18 forks source link

Refactor Citizen Lab crawler #135

Closed m-appel closed 7 months ago

m-appel commented 8 months ago

Description

This crawler did not handle the header of the country code CSV files correctly and created a URL and Tag node consisting of the header fields. This refactor uses pandas to parse the CSV file and also removes logging to stderr.

How Has This Been Tested?

Rerun of the crawler.

Screenshots (if appropriate):

MATCH p = (:URL)-[:CATEGORIZED {reference_name: 'citizenlab.urldb'}]->(:Tag {label: 'category_description'})
RETURN p

image

Types of changes

Checklist: