GateNLP / ultimate-sitemap-parser

Ultimate Website Sitemap Parser
https://mediacloud.org/
Other
178 stars 64 forks source link

Not able to parse the html sitemaps #19

Closed malhotraguy closed 5 years ago

malhotraguy commented 5 years ago

Not able to parse html sitemaps. For example tried for: https://www.axa-im.com/site-map Screenshot from 2019-09-18 16-15-49

malhotraguy commented 5 years ago

@pypt Linas have you noticed this bug ?

pypt commented 5 years ago

The module is not designed to parse HTML sitemaps, sorry. As per the README, module's concern is machine-parseable XML sitemaps that are commonly used by Google to be able to find all pages on a website and / or ingest Google News articles.