InternetHealthReport / internet-yellow-pages

A knowledge graph for the Internet
https://iyp.iijlab.net
GNU General Public License v3.0
43 stars 18 forks source link

Better formatting checks for IP, prefixes, and countries #112

Closed romain-fontugne closed 5 months ago

romain-fontugne commented 9 months ago

Currently we make sure that all IPs and prefixes are in lower case. And also that country codes are in upper case. We still need the crawlers to take care of IPv6 compression and we hope they give well-formed IP, prefixes, and country codes but unexpected things can still happen.

https://github.com/InternetHealthReport/internet-yellow-pages/blob/98f970100ec30ea61a0eaff8258629c3af0b279c/iyp/__init__.py#L15-L23

Why not adding the checks in the iyp library too? For example:

prop_formatters = { 
     # asn is stored as an int 
     'asn': int, 
     # ipv6 is stored in lowercase 
     'ip': lambda s: ipaddress.ip_address(s).compressed, 
     'prefix': lambda s: ipaddress.ip_network(s).compressed, 
     'country_code': lambda s: iso3166.countries.get(s).alpha2
 }

Yes, it is redundant, the crawlers still have to do that too, but at least we ensure that no weird node are created. @m-appel did we have any good reason for not adding these checks in iyp/init.py?

m-appel commented 9 months ago

Nope, this is actually just this issue #87, which we have not implemented yet :) The fix is fast, but needs some careful checking if affected crawlers still work correctly.

romain-fontugne commented 9 months ago

Thanks, I'll try a full run with the code mentioned above and see if things break.

romain-fontugne commented 9 months ago

ok, for the records, that broke quite a few crawlers. This is the log: iyp-2024-01-17.log

m-appel commented 9 months ago

Good thing we tested it :) For the ones that break due to the IP address/network, I think they should be fixed (i.e., probably ignore the invalid addresses/prefixes), but I think forcing an ISO3166 country code might be a bit much? I know ZZ for example is technically a correct "Reserved" country code...

romain-fontugne commented 9 months ago

Agreed, we should tolerate also the exotic country codes (EU, AP). But no weird IP should go through these checks.

Forchapeatl commented 8 months ago

I would love to work on this issue.

romain-fontugne commented 8 months ago

How would you fix that issue?

On Wed, Feb 28, 2024, 05:40 FORCHA PEARL @.***> wrote:

I would love to work on this issue.

— Reply to this email directly, view it on GitHub https://github.com/InternetHealthReport/internet-yellow-pages/issues/112#issuecomment-1967552424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKPNXNQ7DGEEUY6RQL3EWDYVZADXAVCNFSM6AAAAABB544OPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRXGU2TENBSGQ . You are receiving this because you authored the thread.Message ID: @.*** com>