CatalogueOfLife / portal

The public facing website and dynamic portal for the CoL
https://www.catalogueoflife.org
4 stars 1 forks source link

404 handling on taxon pages #169

Closed gdower closed 2 years ago

gdower commented 2 years ago

Describe the bug There are a few issues with the 404 handling on the portal, including that the overall HTTP status code for non-existent taxon pages returned is a 200, not a 404. An output message probably put there for debugging purposes also gets printed at the top of the page. I think the lack of a 404 error might penalize SEO because it will result in a lot of duplicate pages with the same exact "404" content, but the search engine crawler won't know it's a 404 error.

Regenerating the sitemap on deployment becomes even more important if we can't return a 404 status code.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://www.catalogueoflife.org/data/taxon/3GUJ3
  2. Look at the status codes returned
  3. Check the banner

Screenshots image

Browser:

gdower commented 2 years ago

@mdoering, I tested the top 1000 IDs in the sitemap_ba.txt file for the https://api.catalogueoflife.org/dataset/3LR/taxon/ endpoint and got 404 errors for 37%. Can the sitemap files be regenerated on deployment? I could open a separate issue for that.

mdoering commented 2 years ago

They were created rather manually then. We should create them when a release goes public I reckon?

mdoering commented 2 years ago

37% is a lot. We should investigate why that is.

gdower commented 2 years ago

The portal also calls the https://api.catalogueoflife.org/dataset/3LR/synonym/ endpoint and that always returned a 404 for the top 1000 IDs, but maybe that's just an artifact of taking the IDs from the sitemap?

mdoering commented 2 years ago

no idea, @thomasstjerne it seems the portal always calls taxon, taxon/info and synonym for any ID?

mdoering commented 2 years ago

portal now returns 404 status codes and a new Dodo 404 page: https://www.catalogueoflife.org/data/taxon/Dodo