Closed chatelao closed 5 years ago
Thanks, fixed in develop
. Will probably release a new version soon.
so cool, thank you a lot
0.2 released.
Thanks a lot (I've got 0.3, right?)
Did you see Google's release of the robots.txt parser?
https://opensource.googleblog.com/2019/07/googles-robotstxt-parser-is-now-open.html
Thanks, I'll take a look.
I think implementing robots.txt
parser is easy enough to do on one's own. Main takeaway from Google's implementation is that they tolerate both Sitemap:
and Site-map:
annotations.
The new parser works great, my "wget" job is running very well with the data extracted.
Probably this site has a strange format or I called something wrong?
The result:
Reading the robots.txt manually, I know there are two layers of sitemap.xml