Introduce SameDomainFilter

elixir-crawly / crawly

Crawly, a high-level web crawling & scraping framework for Elixir.

https://hexdocs.pm/crawly

Apache License 2.0

953 stars 112 forks source link

Introduce SameDomainFilter #248

Closed serpent213 closed 1 year ago

serpent213 commented 1 year ago

My application requires only one or two different Spiders but these are run over many different domains (passed as option to start_spider). So setting the base_url turned out to be difficult and I came up with this solution.

Thoughts?

oltarasenko commented 1 year ago

@serpent213 The code looks quite nice. It would be great if you could write an article about your use case!

serpent213 commented 1 year ago

@serpent213 The code looks quite nice. It would be great if you could write an article about your use case!

You mean like a blog article? Good idea.

In one sentence, I'm building a general purpose search engine (think DuckDuckGo) for preselected sites.

oltarasenko commented 1 year ago

That sounds like a super big and interesting project! I think it would be nice to hear the story!