DDMAL / CantusDB

A new site for Cantus Database running under Django.
https://cantusdatabase.org
MIT License
5 stars 6 forks source link

Update Staging server OS #1303

Open dchiller opened 5 months ago

dchiller commented 5 months ago

In light of #1214, we should update the staging server OS.

Steps (may be added to):

ahankinson commented 5 months ago

There is a staging server already… it’s provisioned on Arbitus and is running Fedora.

dchiller commented 5 months ago

There is a staging server already… it’s provisioned on Arbitus and is running Fedora.

Yes, this is what I've been running the ansible playbooks against. I mis-remembered our conversation about this and thought the plan had been to decommission that one once we were ready to deploy...but of course we were actually going to decommission the current staging machine

dchiller commented 4 months ago

Relevant changes to the staging branch have been made in a new branch -- staging-ansible -- which can be merged to staging when we decommission the old staging server.

dchiller commented 4 months ago

@jacobdgm @lucasmarchd01 The new staging server has been deployed! You can see it at staging.new.cantusdatabase.org.

I'd like to make sure you have ssh access. Can you add your username and public key by opening a pull request to this file:

https://github.com/dact-chant/ansible.cantus-db/blob/main/roles/users/vars/main.yml

dchiller commented 4 months ago

I'd like to make sure you have ssh access.

It occurs to me that you already have ssh access by virtue of downloading the ansible ssh keys...but, you should still do this!

jacobdgm commented 4 months ago

Can you add your username and public key by opening a pull request to this file:

https://github.com/dact-chant/ansible.cantus-db/blob/main/roles/users/vars/main.yml

Done! Thank you for setting this up!

dchiller commented 2 months ago

@lucasmarchd01 was able to update the staging server yesterday, so I think we can go ahead and complete the rest of these tasks.

Today, I'm going to switch the DNS settings, do a final update of the data, and open a PR to the DDMAL operations repo, so that I think by tomorrow, we could remove the old staging server on DRAC. @ahankinson Anything else you can think of that we need to do before we go ahead with that step?

ahankinson commented 2 months ago

Have you tested it by changing your hosts file locally?

dchiller commented 2 months ago

Have you tested it by changing your hosts file locally?

With an alias hostname registered with our DNS provider, yes. (Our current ansible set-up doesn't work without a registered hostname, but I am thinking as I write this that that might be something to reconfigure).

ahankinson commented 2 months ago

I mean, changing your local machine's hosts file to the new URL you want it to work as, and checking it. When you roll out a DNS change it's always good to test the new URL before making the change in DNS.

https://kinsta.com/knowledgebase/edit-mac-hosts-file/

ahankinson commented 2 months ago

With this you can check "staging.cantusdatabase.org" without making the change "live" by updating the DNS. Likewise, when you get the production site ready you can change your hosts file to point "cantusdatabase.org" to the new production server and test it, then change the DNS entry.

PS: Make sure you update the value in the Django "sites" table...

dchiller commented 2 months ago

Right. What I'm saying is that because our ansible set-up checks for and obtains certificates, you can't currently deploy with a hostname that is not registered with a public DNS provider (if you do, the certificate challenge will fail and the deployment as a whole will fail).

ahankinson commented 2 months ago

Ah yes. True. That’s also going to be a problem with the live site. What I’ve done in the past is manually copied the certs from the old server to the new server , at which point you will already have the SSl certs. Then you can do the testing, then change the DNS entry. After this you can run lets encrypt and get the new certs.

dchiller commented 2 months ago

Ok.... the server is up and working at staging.cantusdatabase.org, so I think we can go ahead and shut down the old one.

We'll do the certificate & hosts file testing bit with production

ahankinson commented 2 months ago

Looks good! Congrats.

BTW, I noticed when looking at the Google search console the other day that Google knows about the staging server. You might want to set a simple username / password on staging so that it (and other bots) won’t start indexing it.

dchiller commented 2 months ago

You might want to set a simple username / password on staging so that it (and other bots) won’t start indexing it.

When we last discussed this, Ich didn't want there to be a password...is this something (at least for search results) we could do with robots.txt, though?

fujinaga commented 2 months ago

I don’t want a password, so that a few people outside of McGill can easily access the staging for testing, e.g., Debra and Gen. But if there’s no other way, I will concede.

On May 4, 2024, at 8:30 AM, Dylan Hillerbrand @.***> wrote:

You might want to set a simple username / password on staging so that it (and other bots) won’t start indexing it.

When we last discussed this, Ich didn't want there to be a password...is this something (at least for search results) we could do with robots.txt, though?

— Reply to this email directly, view it on GitHubhttps://github.com/DDMAL/CantusDB/issues/1303#issuecomment-2094147021, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAH342PR7TGX2GAWGC2MR43ZATIFXAVCNFSM6AAAAABC72FQMKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGE2DOMBSGE. You are receiving this because you are subscribed to this thread.Message ID: @.***>

ahankinson commented 2 months ago

It could be as simple as cantus / cantus. The purpose isn’t security, it’s just to prevent bots.

ahankinson commented 2 months ago

You can manage it with robots.txt, but some crawlers (especially the ones that are a bit shady) don't necessarily follow it.

You can also use noindex on the pages, but that still counts towards your "crawl budget" for the site, and is generally discouraged as a means of crawler access control on large sites. See: https://developers.google.com/search/docs/crawling-indexing/large-site-managing-crawl-budget#best_practices (The whole page is worth a read, though...)

dchiller commented 2 months ago

It could be as simple as cantus / cantus. The purpose isn’t security, it’s just to prevent bots.

@fujinaga what do you think of this?

You can manage it with robots.txt, but some crawlers (especially the ones that are a bit shady) don't necessarily follow it.

Ok, so the second best option.... but maybe something to have anyway. When I was running this basic crawler the other day on Old Cantus, there were a bunch of urls that were being crawled that were useless, but not sure if there are any in New Cantus.

ahankinson commented 2 months ago

but not sure if there are any in New Cantus.

Oh, there's a lot. It matters less what URLs you publish now, and more what URLs have been published before. Since Google keeps a log of the URLs it knows about, it will keep re-checking them until it eventually drops them. Also, if others have linked to CD, that URL will persist on the web.

For example, there are currently ~172,000 URLs that Google has identified as having a redirect. These are URLs like:

The first one returns a 302, which means Google will keep trying to index that.

There are also ~127,000 404s, which means that Google is re-reading a URL that it knows about, but that is no longer there.

Google will eventually sort these things out, but the effect of these is that your "crawl budget" gets exhausted by looking for bad URLs, and that leaves less for indexing the good ones in a timely manner.

dchiller commented 1 month ago

I think we are about ready to close this. I just added #1474 to discuss/track the addition of a simple password to staging to deal with the bot/indexing issue discussed in more detail here.

@ahankinson Otherwise, the last step is to turn off the old staging server. Our new server has been up for over a month now, and given that it is staging, I feel confident about removing it.

ahankinson commented 1 month ago

Sounds good. Could you just double-check and confirm that there are no other web sites running on it? 🥹

dchiller commented 3 days ago

@ahankinson Confirmed that there's nothing we need on the staging server. Thanks!