Data URI image links gets added, but they should be left out. Those are commonly used for example for lazy loading images. The real image URLs are inside NOSCRIPT tags and they get added OK.
Running:
python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --images --report --parserobots
Output:
<image:loc>https://www.2globalnomads.info/data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7</image:loc>
A few improvement proposals
Image sitemap is the only way to tell search engines the licenses of images. Please consider adding the script an option for a site-wide license for all images. It could work like this:
python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --license http://creativecommons.org/publicdomain/zero/1.0/
With the following output added inside \ after \:
<image:license>http://creativecommons.org/publicdomain/zero/1.0/</image:license>
You could prettyprint the sitemap.xml a bit and add there newlines after every closing tag. That would make it a bit more human readable.
If you want, you can also take \ from TITLE and/or ALT and \ from FIGCAPTION tags if they are present.
Data URI image links gets added, but they should be left out. Those are commonly used for example for lazy loading images. The real image URLs are inside NOSCRIPT tags and they get added OK. Running:
python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --images --report --parserobots
Output:<image:loc>https://www.2globalnomads.info/data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7</image:loc>
A few improvement proposals
Image sitemap is the only way to tell search engines the licenses of images. Please consider adding the script an option for a site-wide license for all images. It could work like this: after \:
python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --license http://creativecommons.org/publicdomain/zero/1.0/
With the following output added inside \<image:license>http://creativecommons.org/publicdomain/zero/1.0/</image:license>
You could prettyprint the sitemap.xml a bit and add there newlines after every closing tag. That would make it a bit more human readable.
If you want, you can also take \ from TITLE and/or ALT and \ from FIGCAPTION tags if they are present.
Cheers, Santeri