IMG Data URI and image license

Data URI image links gets added, but they should be left out. Those are commonly used for example for lazy loading images. The real image URLs are inside NOSCRIPT tags and they get added OK. Running: python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --images --report --parserobots Output: <image:loc>https://www.2globalnomads.info/data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7</image:loc>

A few improvement proposals

Image sitemap is the only way to tell search engines the licenses of images. Please consider adding the script an option for a site-wide license for all images. It could work like this: python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --license http://creativecommons.org/publicdomain/zero/1.0/ With the following output added inside \ after \: <image:license>http://creativecommons.org/publicdomain/zero/1.0/</image:license>

You could prettyprint the sitemap.xml a bit and add there newlines after every closing tag. That would make it a bit more human readable.

If you want, you can also take \ from TITLE and/or ALT and \ from FIGCAPTION tags if they are present.

Cheers, Santeri

c4software / python-sitemap

IMG Data URI and image license #26

A few improvement proposals