GateNLP / ultimate-sitemap-parser

Ultimate Website Sitemap Parser
https://mediacloud.org/
Other
181 stars 64 forks source link

If `Content-Type` header is set, verify it's the expected one #4

Open pypt opened 6 years ago

pypt commented 6 years ago

For example, if Content-Type for /robots.txt is text/html (and not text/plain), this usually means that the file is missing (and instead a 404 page would get returned) so there's no need to attempt to parse it.

Same goes for XML files, plain text sitemaps, and gzipped XML / text sitemaps.

tanweeralii commented 3 years ago

Hey @pypt, I am a beginner here and I am looking for an good first issue. Seems this issue is good to start with. I have compiled and build it in my system. Now can you just brief more about this issue like how to solve. Thanks