InternetHealthReport / internet-yellow-pages

A knowledge graph for Internet resources
GNU General Public License v3.0
39 stars 16 forks source link

Improve performance of MANRS crawler #100

Closed m-appel closed 8 months ago

m-appel commented 9 months ago

Crawler slow. Now crawler fast.

Description

The crawler was unreasonably slow for how little data it actually pushed. This is one of the first crawlers and therefore used only single get functions, which can get pretty slow in nested loops. Now it uses batch functions.

Also removed the sys.exit in case of a failure and replaced with an exception.

How Has This Been Tested?

I compared that the created nodes/relationships are the same as with the old crawler.

Types of changes

Checklist:

romain-fontugne commented 8 months ago

but now it is conflicting with recent changes :P