CatalogueOfLife / checklistbank

UI for checklistbank.org
https://www.checklistbank.org/
7 stars 2 forks source link

Enable SEO friendly pages #1086

Open mdoering opened 2 years ago

mdoering commented 2 years ago

For search engines and other machine things the site needs to provide some html content in addition to the javascript content. Consider various options out there incl server side rendering of React.

DOI landing pages dataset/{key} in particular should also provide schema.org metadata so it is recognised by google as a dataset page, see https://support.datacite.org/docs/how-do-i-expose-my-datasets-to-google-dataset-search

timrobertson100 commented 2 years ago

It's not entirely clear to me at least, why we'd want checklistbank pages indexed by search engines. Aren't COL and GBIF websites the place we'd expect to drive traffic?

mdoering commented 2 years ago

COL only show the very latest version of the COL Checklist, but no other dataset or release like the previous or even current annual checklist (the new monthly one hides that). I would expect all public datasets to be findable by google if we advertise CLB as a repository. You can find the COL Checklist only under GBIF currently: https://datasetsearch.research.google.com/search?src=2&query=Catalogue%20of%20Life%20Checklist&docid=L2cvMTFqOWJ2Zjd5cw%3D%3D

The main thing wrong here is that the COL portal does not even show up. I hope to have changed that already by adding schema.org dataset metadata to the metadata page which is also the landing page for the current DOI.

mdoering commented 2 years ago

Tools and help texts (e.g. how to publish) in CLB would also benefit from being searchable. Individual taxa, references and other data is probably not needed and confuses google more than it helps.

mdoering commented 2 years ago

Having a dataset DOI that points to ChecklistBank (which all non current COL releases do) which then is opaque to google and does not contain schema.org dataset metadata is not great.

mdoering commented 1 year ago

We would like to see at least static pages like about, the tools and probably also the dataset search and details to appear ranked high in google and other search engines. We advertise CLB as a product on its own and even have a dedicated domain.

mdoering commented 1 year ago

The robots.txt file currently blocks all searches. Thats the first thing to remove I suspect - and make the build conditional to include the reject all robots on dev only.

mdoering commented 9 months ago

We could use the same method as on the COL portal for CLB. The index.html can be served on a pr route basis with injected SEO tags, JSON-LD etc for datasets, about pages etc, by simply replacing the comment with the desired data. httpd could be configured to only fetch the index.html file from the backend for specific routes, otherwise just render it as is.