Open mdoering opened 2 years ago
It's not entirely clear to me at least, why we'd want checklistbank pages indexed by search engines. Aren't COL and GBIF websites the place we'd expect to drive traffic?
COL only show the very latest version of the COL Checklist, but no other dataset or release like the previous or even current annual checklist (the new monthly one hides that). I would expect all public datasets to be findable by google if we advertise CLB as a repository. You can find the COL Checklist only under GBIF currently: https://datasetsearch.research.google.com/search?src=2&query=Catalogue%20of%20Life%20Checklist&docid=L2cvMTFqOWJ2Zjd5cw%3D%3D
The main thing wrong here is that the COL portal does not even show up. I hope to have changed that already by adding schema.org dataset metadata to the metadata page which is also the landing page for the current DOI.
Tools and help texts (e.g. how to publish) in CLB would also benefit from being searchable. Individual taxa, references and other data is probably not needed and confuses google more than it helps.
Having a dataset DOI that points to ChecklistBank (which all non current COL releases do) which then is opaque to google and does not contain schema.org dataset metadata is not great.
We would like to see at least static pages like about, the tools and probably also the dataset search and details to appear ranked high in google and other search engines. We advertise CLB as a product on its own and even have a dedicated domain.
The robots.txt file currently blocks all searches. Thats the first thing to remove I suspect - and make the build conditional to include the reject all robots on dev only.
We could use the same method as on the COL portal for CLB.
The index.html
can be served on a pr route basis with injected SEO tags, JSON-LD etc for datasets, about pages etc, by simply replacing the comment with the desired data.
httpd could be configured to only fetch the index.html file from the backend for specific routes, otherwise just render it as is.
For search engines and other machine things the site needs to provide some html content in addition to the javascript content. Consider various options out there incl server side rendering of React.
DOI landing pages
dataset/{key}
in particular should also provide schema.org metadata so it is recognised by google as a dataset page, see https://support.datacite.org/docs/how-do-i-expose-my-datasets-to-google-dataset-search