CorrelAid / correlaid_website

Source code for the CorrelAid website
https://correlaid.org
3 stars 0 forks source link

Create forwards from old correlaid website to the new website. #334

Closed KonradUdoHannes closed 11 months ago

KonradUdoHannes commented 11 months ago

This mostly a release issue as the links should be created once the new website is live and not sooner, but since there are numerous blogposts etc, this might need to be prepared. The links from the old website should then respond with forwards and a http status code 301 (moved permanently). This is probably the most important SEO issue as it converts any backlinks from the old site to trafic to the new site.

@friep I think In some discussion you mentioned you would take care of this task. Do I remember this correctly? Otherwise it would be important to plan for it such that somebody with the correct access rights does it during launch.

jstet commented 11 months ago

I think this is a bit of a clone of #206

friep commented 11 months ago

on it @KonradUdoHannes . i'll prepare a google sheet so that I can then import it via csv: https://developers.cloudflare.com/rules/url-forwarding/bulk-redirects/reference/csv-file-format/

friep commented 11 months ago

is there a way to get all URLs from the svelte app? 😬 otherwise i'll have to check manually ^^

jstet commented 11 months ago

I found something like this: https://github.com/bartholomej/svelte-sitemap

Also, maybe take a look at how Konrad collected the slug routes here: https://github.com/CorrelAid/correlaid_website/blob/main/svelte.config.js

KonradUdoHannes commented 11 months ago

I've already send @friep a list of the routes as they come out of the static build. For reference here is how I collected them (Only for unix systems unfortunately)

  1. Create a static build by setting env vars as described in the readme and running

    npm run build
  2. Go to the root folder of the static build

    cd .svelte-kit/cloudflare
  3. Run find for directories excluding some paths that are not routes

    find . -type d -not -path "./_app" -not -path "./_app/*"

I'm pretty sure power shell has some find equivalent that could be used on native windows, but I'm not so fluent in that. The first two steps would be the same however.

friep commented 11 months ago

thanks you two :) i already did it manually now (just sent you an invite to the google sheet). Rows that are empty except for the old URL (source URL) do not need a rule because they are either the same on the new website or they are captured by a route with subpath matching, e.g. we can redirect all correlaidx/[city-name] urls to community/correlaidx/[city-name] with one rule.

friep commented 11 months ago

there is a max of 20 redirects i can add via cloudflare.. sigh. currently have over 400 in the list..

source for limits: https://developers.cloudflare.com/rules/url-forwarding/#availability

friep commented 11 months ago

i'll prioritize the important ones for now ..

KonradUdoHannes commented 11 months ago

@friep if we don't want people to end up on non-existent pages we can also add a service worker to the old page that forwards to the new one. This has the advantage that we will not have a limitation as to what we can forward, but it will probably not be picked up in terms of SEO back links.

friep commented 11 months ago

@KonradUdoHannes i am not sure i understand this correctly. :/ the old page is hosted on netlify but we will remove the URL correlaid.org from it don't we? will this approach then work if pages are indexed under correlaid.org at google?

KonradUdoHannes commented 11 months ago

@friep you are right. It does not make sense what I described, but it could be done the other way around to some extent. We could add the functionality to the service worker on our new website that inherits the old URL. The service worker could then potentially identify requests that were intended from the old site instead of the new one and forward them accordingly. But another issue with this workaround is the way service workers start servicing, which by default is only on the first refresh, which would defeat the purpose a bit. There is ways to activate them right away, but that makes the workaround implementation even more complex.

friep commented 11 months ago

@KonradUdoHannes we had to switch to vercel for the production deployment as cloudflare did not allow for configuring the apex domain with external DNS (in our case hetzner). so we can now use vercels approach which is adding the redirects in a json file (see PR)

KonradUdoHannes commented 11 months ago

I merged the PR all the way to production and tried a few redirects that are set in the json file. The redirects that I tried work, but they seem to be strict about the trailing slash. For instance the route from the old website /en/nonprofits/experts/ gets redirected as expected, but /en/nonprofits/experts , i.e. without the trailing slash, runs into our 404 page.

Not sure how strict the old side was about trailing slashes. If the site itself was also strict, then we would probably not loose any old backlinks,because they would have needed the trailing slash if they were working in the first place.

KonradUdoHannes commented 11 months ago

@friep what are your thoughts with regard to the previous comment? If there is not further ToDo here we could close this issue and the SEO issue.

friep commented 11 months ago

the old website did allow for both, e.g. https://correlaid.netlify.app/about/codeofconduct/ and https://correlaid.netlify.app/about/codeofconduct both point to the same page. only the former gets redirected now..

i found this stackoverflow answer: https://stackoverflow.com/questions/4007302/regex-how-to-match-an-optional-character however i'm not sure whether it's worth the effort. I will check out the search console for any hints whether we have a problem there.

friep commented 11 months ago

maybe can someone else also check the google search console but as far as i understand there is not a lot of 404 ( a lot of not indexed because of redirects but that's not a problem as far as i understand). that'd indicate that we don't have the problem of "old" links lying around that don't get redirected properly..!?

KonradUdoHannes commented 11 months ago

I checked all the external links from the google search console and identified the following issues.

https://www.correlaid.org/about/ (404) https://www.correlaid.org/material/datenreifegradmodell.pdf (404) https://www.correlaid.org/blog/r-lernen-kurs/ (404) https://www.correlaid.org/blog/r-lernen-kurs (404) https://www.correlaid.org/blog/gender_bias_and_mobility/ (forwards to blog post overview page does the post not exist anymore?)

The following pages did not give any non-found issues but also did not load any content, so these might be bugs not related to this issue. If it becomes certain that these are not forwarding related, a seperate issue should be opened.

https://www.correlaid.org/daten-nutzen/projekte/2020-03-ERL (empty site) https://www.correlaid.org/daten-nutzen/projekte/2022-04-LAU (empty site) https://correlaid.org/projects/2022-04-lau/ (also empty as above, but gets parsed to caps)

friep commented 11 months ago

thank you @KonradUdoHannes . going through them and making PR #459

https://www.correlaid.org/about/ (404) --> added to vercel.json https://www.correlaid.org/material/datenreifegradmodell.pdf (404) --> added the pdf to cms and then link to vercel.json https://www.correlaid.org/blog/r-lernen-kurs/ (404) --> added to vercel.json, redirect to education subpage https://www.correlaid.org/blog/r-lernen-kurs (404) --> added to vercel.json, redirect to education subpage https://www.correlaid.org/blog/gender_bias_and_mobility/ (forwards to blog post overview page does the post not exist anymore?) --> there were so so many blog posts and I just couldn't adapt all of them, especially the ones with many pictures took so long (Alt text..). hence quite a few are not online (yet). those redirect to the blog landing page wiih a non-permanent redirect.

re the projects: the capitalization is the redirect. Empty sites are a bug imo --> #458

KonradUdoHannes commented 11 months ago

The changes are merged in main and the CD step ran successfully, which should have triggered a build on vercel itself. I don't see any changes wrt. to the forwarding on the productive site. Maybe there was an issue with the build step on the vercel site or our CD workflow does not trigger everything that is required do adjust forwarding.

@friep @jstet as I don't have access to vercel, can you have a look to see whether anything stands out?

KonradUdoHannes commented 11 months ago

Not sure what happened with the build process earlier. But after deploying a security fix this everything build successfully and the forwarding changes are included as well. This closes this issue.

It might be that the previous build step got ignored because the PR still followed our <issue branch> -> preview -> main deployment process while the target deployment process is not again simplified as <issue branch> -> main.