TurnerSoftware / InfinityCrawler

A simple but powerful web crawler library for .NET
MIT License
245 stars 36 forks source link

Exception thrown on none http/https scheme #132

Closed novastream closed 2 years ago

novastream commented 2 years ago

When crawling an exception is thrown when processing mailto links for example.

This is fixed by implementing your own RequestProcessor https://github.com/TurnerSoftware/InfinityCrawler/blob/main/src/InfinityCrawler/Processing/Requests/DefaultRequestProcessor.cs

And checking the uri in the Add method like this https://stackoverflow.com/a/7581824

Turnerj commented 2 years ago

Hey @novastream - thanks for raising this issue! Yeah, it definitely should be checking that.

Turnerj commented 2 years ago

I've released a new version (0.5.0) which contains the fix for this!