Closed romain-fontugne closed 5 months ago
Nope, this is actually just this issue #87, which we have not implemented yet :) The fix is fast, but needs some careful checking if affected crawlers still work correctly.
Thanks, I'll try a full run with the code mentioned above and see if things break.
ok, for the records, that broke quite a few crawlers. This is the log: iyp-2024-01-17.log
Good thing we tested it :) For the ones that break due to the IP address/network, I think they should be fixed (i.e., probably ignore the invalid addresses/prefixes), but I think forcing an ISO3166 country code might be a bit much? I know ZZ
for example is technically a correct "Reserved" country code...
Agreed, we should tolerate also the exotic country codes (EU, AP). But no weird IP should go through these checks.
I would love to work on this issue.
How would you fix that issue?
On Wed, Feb 28, 2024, 05:40 FORCHA PEARL @.***> wrote:
I would love to work on this issue.
— Reply to this email directly, view it on GitHub https://github.com/InternetHealthReport/internet-yellow-pages/issues/112#issuecomment-1967552424, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKPNXNQ7DGEEUY6RQL3EWDYVZADXAVCNFSM6AAAAABB544OPKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRXGU2TENBSGQ . You are receiving this because you authored the thread.Message ID: @.*** com>
Currently we make sure that all IPs and prefixes are in lower case. And also that country codes are in upper case. We still need the crawlers to take care of IPv6 compression and we hope they give well-formed IP, prefixes, and country codes but unexpected things can still happen.
https://github.com/InternetHealthReport/internet-yellow-pages/blob/98f970100ec30ea61a0eaff8258629c3af0b279c/iyp/__init__.py#L15-L23
Why not adding the checks in the iyp library too? For example:
Yes, it is redundant, the crawlers still have to do that too, but at least we ensure that no weird node are created. @m-appel did we have any good reason for not adding these checks in iyp/init.py?