amir-jakoby / crawler-commons

Automatically exported from code.google.com/p/crawler-commons
0 stars 0 forks source link

Sitemap URLs in robots.txt are unnecessarily lowercased #28

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
This is code snippet, line 381-... of SimpleRobotsRuleParser 
{code}
            line = line.trim().toLowerCase();
            if (line.length() == 0) {
                continue;
            }

            RobotToken token = tokenize(line);
{code}

Use case scenario: it doesn't work with sitemaps listed at 
http://www.tripadvisor.com/robots.txt

Original issue reported on code.google.com by fuad.efe...@tokenizer.ca on 10 Jun 2013 at 7:15

GoogleCodeExporter commented 8 years ago

Original comment by digitalpebble on 10 Jun 2013 at 8:25