What steps will reproduce the problem?
1. Only crawl pages with prefix http://fano.ics.uci.edu/
2. have robotstxtConfig enabled
3. crawl from seed http://fano.ics.uci.edu/
What is the expected output? What do you see instead?
# fano.ics.uci.edu
User-Agent: *
Disallow: /ca/rules/
Should not be crawling /ca/rules/
These are being crawled
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g1.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g2.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g3.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g4.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g5.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g6.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g7.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g8.html
URL: http://fano.ics.uci.edu/ca/rules/b3s23/g9.html
What version of the product are you using?
3.5
Please provide any additional information below.
Original issue reported on code.google.com by Dave.Hir...@gmail.com on 21 Jan 2015 at 9:06
Original issue reported on code.google.com by
Dave.Hir...@gmail.com
on 21 Jan 2015 at 9:06