Closed iyzana closed 3 years ago
When parsing https://www.deviantart.com/robots.txt
https://www.deviantart.com/robots.txt
User-agent: * Disallow: /*q= Disallow: /users/*? Disallow: /join/*? Disallow: /morelikethis/ Disallow: /download/ Disallow: /checkout/ Disallow: /global/ Disallow: /api/ Disallow: /critiques/ Sitemap: http://sitemaps.deviantart.net/sitemap-index.xml.gz
the parser fails with
thread 'main' panicked at 'assertion failed: !val.is_empty()', /home/me/.local/share/cargo/registry/src/github.com-1ecc6299db9ec823/robotstxt-0.2.0/src/parser.rs:207:17
Reproduction:
use robotstxt::DefaultMatcher; fn main() { let robots_content = r#"User-agent: * Disallow: /*q= Disallow: /users/*? Disallow: /join/*? Disallow: /morelikethis/ Disallow: /download/ Disallow: /checkout/ Disallow: /global/ Disallow: /api/ Disallow: /critiques/ Sitemap: http://sitemaps.deviantart.net/sitemap-index.xml.gz"#; let mut matcher = DefaultMatcher::default(); matcher.one_agent_allowed_by_robots(&robots_content, "oldnews", "https://www.deviantart.com/"); }
I'm assuming it is because of the line between the Disallows and the Sitemap, which only contains a single space.
Disallow
Sitemap
It seems someone else patched this bug on their fork: https://github.com/scascketta/robotstxt/commit/ffe972d507a0a30a21b0b164329f9a1fff73ce85
Hi @iyzana. Hugely thanks for your feedback. 👍 This has been fixed via 67475a1.
When parsing
https://www.deviantart.com/robots.txt
the parser fails with
Reproduction:
I'm assuming it is because of the line between the
Disallow
s and theSitemap
, which only contains a single space.