Open eklem opened 10 years ago
I guess there are two things to check for. 1: User agent and if it matches specific or * is used. 2: Make an array of parts of site to not follow and check each link that the crawler wants to follow against this array
And default to "yes". The user-agent string connects to this, but it's not necessary to develope this one. https://github.com/fergiemcdowall/norch-fetch/issues/10
-f --followrobotstxt <yes/no> if you want your fetcher to play nice or not