Geta / geta-optimizely-sitemaps

Search engine sitemaps.xml for Optimizely CMS 12 and Commerce 14
Apache License 2.0
10 stars 14 forks source link

[Change Request] Add the ability to create an Image XML Sitemap #81

Open GeekInTheNorth opened 1 year ago

GeekInTheNorth commented 1 year ago

We are using the Geta Sitemaps module on a number of clients. One has recently requested the addition of an image xml sitemap.

Based on: https://www.holisticseo.digital/technical-seo/image-sitemap/

An Image only sitemap may look like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <image:image>
        <image:loc>https://www.holisticseo.digital/image-1.jpg</image:loc>
        <image:caption>Everything you need to know about Images</image:caption>
        <image:geo_location>London, United Kingdom</image:geo_location>
        <image:title>Image Sitemap Example</image:title>
        <image:license>Creator:HolisticSEO.Digital, Credit Line: HolisticSEO, Copyright Notice: Free to Use</image:licence>
    </image:image>
</urlset>

An Images within a regular sitemap may look like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" 
    xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
        <loc>https://www.holisticseo.digital/image-sitemap</loc>
        <image:image>
            <image:loc>https://www.holisticseo.digital/image-1.jpg</image:loc>
        </image:image>
        <image:image>
            <image:loc>https://www.holisticseo.digital/image-2.jpg</image:loc>
        </image:image>
    </url>
</urlset> 

Do you have any plans to expand your functionality into including an Image Sitemap? are you open to collaborations if this is not on your roadmap?

GeekInTheNorth commented 1 year ago

@marisks I managed to prototype this for our client build and we will probably solve it this way. The way you have structured the XML builders here to be so extensible is brilliant as this allowed me to leverage a lot of the functionality you've created.

public class StandardAndImageSitemapXmlGenerator : SitemapXmlGenerator, IStandardSitemapXmlGenerator
{
    private readonly ILogger<StandardSitemapXmlGenerator> _logger;

    private readonly XNamespace _imageNamespace;

    private readonly XAttribute _imageAttribute;

    public StandardAndImageSitemapXmlGenerator(
        ISitemapRepository sitemapRepository,
        IContentRepository contentRepository,
        IUrlResolver urlResolver,
        ISiteDefinitionRepository siteDefinitionRepository,
        ILanguageBranchRepository languageBranchRepository,
        IContentFilter contentFilter,
        IUriAugmenterService uriAugmenterService,
        ISynchronizedObjectInstanceCache objectCache,
        IMemoryCache cache,
        ILogger<StandardSitemapXmlGenerator> logger)
        : base(
            sitemapRepository,
            contentRepository,
            urlResolver,
            siteDefinitionRepository,
            languageBranchRepository,
            contentFilter,
            uriAugmenterService,
            objectCache,
            cache,
            logger)
    {
        _logger = logger;

        _imageNamespace = XNamespace.Get("http://www.google.com/schemas/sitemap-image/1.1");
        _imageAttribute = new XAttribute(XNamespace.Xmlns + "image", _imageNamespace.NamespaceName);
    }

    protected override XElement GenerateRootElement()
    {
        XElement rootElement = new XElement(SitemapXmlGenerator.SitemapXmlNamespace + "urlset", _imageAttribute);

        if (this.SitemapData.IncludeAlternateLanguagePages)
            rootElement.Add((object)new XAttribute(XNamespace.Xmlns + "xhtml", (object)SitemapXmlGenerator.SitemapXhtmlNamespace));
        return rootElement;
    }

    protected override XElement GenerateSiteElement(IContent contentData, string url)
    {
        var element = base.GenerateSiteElement(contentData, url);
        var urlResolverArgs = new UrlResolverArguments { ForceAbsolute = true };

        if (contentData is SitePageData sitePageData && !sitePageData.TeaserImage.IsNullOrEmpty())
        {
            try
            {
                var imageUrl = UrlResolver.GetUrl(sitePageData.TeaserImage, "en", urlResolverArgs);
                var image = new XElement(_imageNamespace + "loc", (object)imageUrl);
                var imageParent = new XElement(_imageNamespace + "image", image);

                element.Add(imageParent);
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Oh No!");
            }
        }

        return element;
    }
}

I may make this my next blog subject if you're okay with that :)

I was thinking the more long term solution would be an interface a page could implement?

public interface IPageWithSitemapImages
{
   IList<ContentReference> SitemapImages { get; }
}
marisks commented 1 year ago

Looks good and a blog post is welcome.

If this is some standard or common feature, we can add it to the sitemaps and make it configurable. I haven't heard about images in sitemaps before.

GeekInTheNorth commented 1 year ago

Normally our clients are happy with standard XML Sitemaps, this is literally the first time I've had a requirement for it.

It should be noted that the following nodes are deprecated as of May 2022 by google:

I find this quite odd because then you have an image without context, unless they are more interested in the version where images are child nodes of the url elements e.g. tied to pages.

GeekInTheNorth commented 1 year ago

And here is the announcement: https://developers.google.com/search/blog/2022/05/spring-cleaning-sitemap-extensions

GeekInTheNorth commented 1 year ago

@marisks Is this something you'd be open to a PR on? Though I suspect that has a greater implementation concern, how it should fit into the existing module.

valdisiljuconoks commented 1 year ago

We are definitely open for PRs to make the package more flexible and feature-rich. Let's exchange some ideas here about how you might see this as part of the library.. (that would save time spent already on implementing something that needs to be changed later on).

marisks commented 1 year ago

@GeekInTheNorth Your implementation of generator looks good. A question is if it should be a default one and should replace the current generator or if it should be an additional generator. And then how it would work with other generators - commerce one for example.

GeekInTheNorth commented 1 year ago

You could make it so you have interfaces for both Image Sitemap enabled content and Video Sitemap enabled content. Your standard and commerce builders could then Consume those common properties. Then it would be up to the individual implementation to decide whether it wants to provide that data or not.

The challenge would be to decide whether you need the additional namespaces or not. But then maybe that could be additional conditions on the Sitemap configuration?

E.g. You choose standard or commerce etc, then tick "include images" and/or "include videos"

There's a very definite product decision to be made either way. And maybe that decision is to do nothing or query the community to see how much of a real demand there is. Because if it's a one off, then that build can override the functionality itself.

valdisiljuconoks commented 1 year ago

thanks @GeekInTheNorth for some insights/ideas.. @marisks it could be also something between the lines how I refactored Google ProductFeed - with "production pipeline" consisting of extractors, enrichers, transformer and what not. not necessary to be that complex as Google Feed (it had it's own complexity and performance penalty to extract Commerce data more than once for different feeds).. but just as an idea that the whole sitemap generation process is transparent, flexible and extendable to suit implementing projects.

more details on the refactoring - https://blog.tech-fellow.net/2022/04/21/googleproductfeed-for-optimizely-reloaded/

marisks commented 1 year ago

I just looked more at what is needed, and the best solution would be to include image and video element generation as base functionality.