coherentdigital / coherencebot

Apache Nutch is an extensible and scalable web crawler
https://nutch.apache.org/
Apache License 2.0
0 stars 0 forks source link

Allow CDN domain in seed configuration #15

Open PeterCiuffetti opened 2 years ago

PeterCiuffetti commented 2 years ago

Related to https://github.com/coherentdigital/commons/issues/2173 accept links to a CDN domain for sites which host their PDFs on another domain.