Closed gdower closed 2 years ago
@mdoering, I tested the top 1000 IDs in the sitemap_ba.txt file for the https://api.catalogueoflife.org/dataset/3LR/taxon/ endpoint and got 404 errors for 37%. Can the sitemap files be regenerated on deployment? I could open a separate issue for that.
They were created rather manually then. We should create them when a release goes public I reckon?
37% is a lot. We should investigate why that is.
The portal also calls the https://api.catalogueoflife.org/dataset/3LR/synonym/ endpoint and that always returned a 404 for the top 1000 IDs, but maybe that's just an artifact of taking the IDs from the sitemap?
no idea, @thomasstjerne it seems the portal always calls taxon, taxon/info and synonym for any ID?
portal now returns 404 status codes and a new Dodo 404 page: https://www.catalogueoflife.org/data/taxon/Dodo
Describe the bug There are a few issues with the 404 handling on the portal, including that the overall HTTP status code for non-existent taxon pages returned is a 200, not a 404. An output message probably put there for debugging purposes also gets printed at the top of the page. I think the lack of a 404 error might penalize SEO because it will result in a lot of duplicate pages with the same exact "404" content, but the search engine crawler won't know it's a 404 error.
Regenerating the sitemap on deployment becomes even more important if we can't return a 404 status code.
To Reproduce Steps to reproduce the behavior:
Screenshots
Browser: