MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

SEO: Provide a sitemap.xml for a list of valid URLs #307

Closed newgene closed 3 years ago

newgene commented 3 years ago

As part of BioSchemas and Data Discovery Engine (DDE) efforts, we would like to crawl through the resource pages provided from MassBank and validate the JSON-LD metadata compliant with schema.org, e.g., a page like this:

https://massbank.eu/MassBank/RecordDisplay?id=LQB00001&dsn=RIKEN_IMS

This is in general a good practice for SEO, would also help the general search engines like Google, Bing to index and rank the resource pages from MassBank.

meier-rene commented 3 years ago

You are welcome to use our sitemap index file at: https://massbank.eu/MassBank/sitemapindex.xml. Does this suit your needs?

tsufz commented 3 years ago

@newgene, thanks a lot for your request. The schemas are already implemented (view-source:https://massbank.eu/MassBank/RecordDisplay?id=LQB00001&dsn=RIKEN_IMS) and validated with Google SEO. Google warns about the missing creator tag. Unfortunately, this is a bigger issue not resolvable with some code snippets. The metadata tags in the MassBank records do not cover schema ready creator tags so far. Curating the datasets is quite tedious because the author tag does not follow any schema.

newgene commented 3 years ago

@meier-rene Yes, that's what we need for the crawling purpose. It's enough for us to proceed on our end.

For general SEO purpose, I believe putting sitemap.xml at the root URL is more common. And I am not sure if sitemapindex.xml is a standard name either, typically just called sitemap.xml.

tsufz commented 3 years ago

Hi, We strictly follow https://www.sitemaps.org/de/protocol.html

The crawlers index all our pages, so no worries.

Yours, Tobias