cu-mkp / m-k-manuscript-data

Text of BnF Ms Fr 640 in multiple formats, metadata about the manuscript, and derived data
14 stars 5 forks source link

SEO and indexing of M&K publications: DCE, RTC, EditionCrafter #2102

Open njr2128 opened 6 months ago

njr2128 commented 6 months ago

From Terry 5/3/2024 @SJPitman brought up an issue about the DCE essays not being indexable and findable on Google. I've been looking into it and think we need to address it. Has to do either with React or AWS infrastructure, or both. Possibly relevant for EC as well.

[4:13 PM, 5/3/2024] Terry Catapano: Serving your static site from S3 through CloudFront is a great approach! In this case, search engine indexing is still possible, but there are some additional considerations:

  1. CloudFront distribution: Make sure your CloudFront distribution is configured to allow search engine crawlers to access your site. You can do this by whitelisting the user agents of popular search engines like Googlebot, Bingbot, and Yandex.
  2. S3 bucket permissions: Ensure that your S3 bucket permissions allow search engines to access your site's content. You can do this by setting the bucket policy to allow GET requests from search engine crawlers.
  3. Cache control: CloudFront's caching mechanism can sometimes interfere with search engine crawling. Make sure to set appropriate cache control headers (e.g., Cache-Control: public, max-age=0) to ensure that search engines can crawl your site regularly.
  4. SSL/TLS encryption: Since CloudFront uses SSL/TLS encryption, ensure that your site is served over HTTPS. This is now a requirement for Google search indexing.
  5. Canonical URLs: If you're using CloudFront's domain (e.g., (link unavailable)), make sure to set canonical URLs in your HTML headers or meta tags to point to your original domain (e.g., (link unavailable)). This helps search engines understand the original source of your content.
  6. Robots.txt: Ensure that your robots.txt file is accessible and allows search engines to crawl your site.
  7. Sitemap submission: Submit your sitemap to Google Search Console and Bing Webmaster Tools to help search engines discover and crawl your site.

By addressing these points, you can ensure that your static site served from S3 through CloudFront is properly indexed by search engines. [4:13 PM, 5/3/2024] Terry Catapano: From Lllama3 re seo for static sites.

njr2128 commented 3 months ago

Currently, if you search for anything that is not mentioned on the home page of the site - even verbatim quotes - Google does not show the results.

For example, here only the sandbox page is returned: image