broken sitemap.xml - Githubissues

Bioconductor / bioconductor.org

Source code for the Bioconductor website

https://bioconductor.org/

24 stars 40 forks source link

broken sitemap.xml #57

Open sneumann opened 4 years ago

sneumann commented 4 years ago

https://www.bioconductor.org/sitemap.xml gives

XML Parsing Error: not well-formed
Location: https://www.bioconductor.org/sitemap.xml
Line Number 1, Column 2:
<%= xml_sitemap %>
-^

from https://github.com/Bioconductor/bioconductor.org/blob/master/content/sitemap.xml There was a suggenstion in a discussion with @egonw about Add a sitemap.xml summarising site content to crawlers including google et al and TeSS Yours, Steffen

mtmorgan commented 4 years ago

since this has been there, unchanged, since March 15 2010 without comment maybe the most expeditious solution is to simple remove it?

egonw commented 4 years ago

I suppose something is supposed to replace the placeholder with content. Yes, would be awesome if it contained a list of all vignettes (HTML) webpages and/or all packages. Indeed, that sitemap.xml can then be used by ELIXIR services to pick up content, e.g. ELIXIR TeSS but also BioSchemas (cc @AlasdairGray).

mtmorgan commented 4 years ago

The site is more than the repository of packages, so sitemap.xml doesn't sound appropriate for this purpose.

For what it's worth package metadata is already available in machine-readable format as https://bioconductor.org/packages/3.12/bioc/VIEWS and presumably also on individual pages if this https://github.com/Bioconductor/bioconductor.org/pull/25 were completed. I can't see the need for a third source of this information.

egonw commented 4 years ago

The sitemap.xml is not critical, I agree. (Any sitemap.xml has redundant information.)

sneumann commented 4 years ago

It is a way of search engine optimisation. OTOH all content on BioC can be considered well-linked, we don't have dynamically generated content, and no dark corners of non-linked stuff we'd want to be found. In that case, removal of a broken sitemap.* is not a loss.

https://support.google.com/webmasters/answer/156184?hl=en&topic=8476&ctx=topic has more information when a sitemap is needed or not.

Yours, Steffen

AlasdairGray commented 4 years ago

While a sitemap is not necessarily essential for the likes of Google who have "unlimited" resources to follow links and hopefully traverse a whole site, it is more difficult for others to do the same. For example, we have started scraping Bioschemas content but do not have the resource to do a full web crawl for it so are reliant on sitemaps.