geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
45 stars 89 forks source link

SSL certs for GO services (HTTPS) #53

Closed cmungall closed 11 months ago

cmungall commented 9 years ago

We need SSL certifications for geneontology.org

kltm commented 9 years ago

The idea would be to get a blanket cert for geneontology.org and all subdomains. More expensive, but future-proof.

kltm commented 9 years ago

This would be necessary to support sites and services that require GO data using https. Current (and future versions) of some web browsers prevent mixing http and https.

stuartmiyasato commented 9 years ago

I am planning to move forward with this now. As I mentioned in a previous email, we can get a three-year AlphaSSL wildcard cert for geneontology.org and *.geneontology.org for $120. This is the cert we are using for the ENCODE project and it seems to be working well. If we want subdomains of geneontology.org (e.g. www.subdomain.geneontology.org) we will need to get a separate cert for that subdomain.

Do we want to discuss specifics about implementation (e.g. redirecting HTTP to HTTPS, specific services, etc.) in this ticket or open new tickets for each topic?

kltm commented 9 years ago

Just to clarify, the wildcard does not allow for subdomains, or are you referring to subsuddomains with your example?

Ugh, this is going to be messy. Essentially, because there can be no mixing for services (http/s considered x-domain), the only way to do this that I am aware of would be to make everything go over to one or the other.

I guess the first thing would be to:

stuartmiyasato commented 9 years ago

Hopefully the following will clear up the cert issue for subdomains. The cert I would get would cover these two cases:

  1. geneontology.org
  2. .geneontology.org where the \ is a value that doesn't include a dot. :)

For values of * that do include a dot, we'd need to get a separate cert. If we have two subdomains, one called sd1 and one called sd2, we'd need the following.

  1. for sd1.geneontology.org and *.sd1.geneontology.org, we'd need a separate cert.
  2. for sd2.geneontology.org and *.sd2.geneontology.org, we'd need a separate cert.

Again where * does not contain a dot.

Hopefully this clears up the subdomain issue. I'm not sure that I'm following the rest of the conversation just yet, but I think that will come in time, perhaps with more specific examples.

kltm commented 9 years ago

Okay, it looks like first-level subdomains are fine then.

For the rest, it looks like some of it can be wheeled out a bit at a time (https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy), with only minor hiccups in debuggability ,etc. I think there are a couple cases that will run afoul of the x-domain stuff, but as long as everything is setup to work either way, we can probably fix them on a case-by-case basis.

stuartmiyasato commented 9 years ago

I have the cert for geneontology.org and *.geneontology.org in hand. I got the AlphaSSL wildcard SSL cert from www.ssl2buy.com. Cost was $120 for three years. The private key and CSR are in user ubuntu's home directory on geneontology.org.

kltm commented 9 years ago

Okay, great--cheaper than I remember. I'm going to be tied down for the next little bit, but you should go ahead with any experiments that don't require any coordination.

stuartmiyasato commented 9 years ago

I set up an AWS instance to test the SSL cert. The URL is https://www-oregon.geneontology.org/. I basically cloned the existing geneontology.org site and moved the clone to a different region (US-West-2 Oregon). I changed the Apache configuration to use HTTPS. I also set up a virtual host on port 80 that simply redirects the same query to port 443. It seems to work transparently from my testing. @kltm , do you have a test suite that we can point at this server?

At the least, this shows that the cert is valid so we don't have to worry about that.

kltm commented 9 years ago

Great.

We have no unit tests (yet) for the main site besides the spider. However, the likely problem areas are few. For example:

https://www-oregon.geneontology.org/page/download-annotations

Is failing with:

Blocked loading mixed active content "http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/gene-associations/go_annotation_metadata.all.js"

Which is pretty much as we expected--we'll have to get viewvnc over to https there for this ugly hack to work. But it does illustrate the type of failures I'll expect for the various AmiGO bits as well.

stuartmiyasato commented 9 years ago

The various sites are so intertwined on the meatloaf.stanford.edu filesystem that I think this HTTPS transition is likely to be an all-or-nothing proposition. (With the exception of AmiGO and GOlr, which are more like a standalone group.) What do you all think of starting that process of migrating the meatloaf-based services up to AWS? (The services largely consisting of SVN/CVS repos and their viewers, the current non-AWS portion of geneontology.org, the anonymous FTP server, the database archives, and the wiki.

cmungall commented 9 years ago

Seems reasonable

On 5 Jun 2015, at 11:47, stuartmiyasato wrote:

The various sites are so intertwined on the meatloaf.stanford.edu filesystem that I think this HTTPS transition is likely to be an all-or-nothing proposition. (With the exception of AmiGO and GOlr, which are more like a standalone group.) What do you all think of starting that process of migrating the meatloaf-based services up to AWS? (The services largely consisting of SVN/CVS repos and their viewers, the current non-AWS portion of geneontology.org, the anonymous FTP server, the database archives, and the wiki.


Reply to this email directly or view it on GitHub: https://github.com/geneontology/go-site/issues/53#issuecomment-109395246

kltm commented 9 years ago

Is the idea that the configurations of the various systems and how they relate will be simplified by the migration to AWS? OTherwise, it seems like just getting HTTPS available on all of the first would allow us to move forward in a more piecemeal fashion.

stuartmiyasato commented 9 years ago

I would love to simplify the configurations, but I'm not sure if that's really feasible (or even possible) due to the tangled web of symlinks that is the filesystem. But if I can sort it out, I'd love to simplify it.

The main reasons for my wanting to deploy the copy in AWS are:

  1. I want to deploy both the HTTPS site and concurrently have the HTTP site redirect to the HTTPS site. We can't test that with the existing systems without simply cutting over -- we'd be testing in production. I'd like to test in an isolated environment to give us time to debug any issues that crop up, and I think it's a good bet that issues will indeed crop up as part of the migration.
  2. If our long-term goals are to move services into the cloud anyway, this would give us a head start down that road as a by-product.

@kltm , I suspect this doesn't really answer your question. Hopefully it gives you an idea of my current mindset, but please do elaborate on your questions if I didn't give you an answer you are looking for.

kltm commented 9 years ago

I guess my gut feeling would be that adding HTTPS to the current servers and then testing service-by-service would be, while possibly a little wobbly as things cut over and back, probably less work than getting everything in the cloud, testing with different names, then trying to flip the switch and debugging everything at once. At least less stressful from my POV; x-domain resources can go over to HTTPS first as they are pretty much optional and would cause minimal disruption while testing, once cleared the whole domain could forward.

However YMMV, and will be doing the heavy lifting in either case.

For immediate use cases, I don't think anything particularly needs to be HTTPS in the foreseeable future with the exception of AmiGO and GOlr, since some institutions that want to use GOlr need to be able to use HTTPS for policy reasons.

kltm commented 9 years ago

From an email with @stuartmiyasato earlier today, a lot of progress has been made on this front.

kltm commented 5 years ago

Now using certbot pretty generally. We have tested it out for the main site and are pretty happy. The work in operations for https://github.com/geneontology/geneontology.github.io/issues/101 has given us most of what we need. I think the easiest might to be to support both HTTP and HTTPS for a while, with the latter just a proxied HTTP, without the forced redirect.

kltm commented 3 years ago

Okay, thinking about a bit more of a step-by-step roadmap. for starters:

kltm commented 1 year ago

Tracking some internal parts at: https://github.com/berkeleybop/bbops/issues/27

kltm commented 1 year ago

@abessiari From our conversation on Monday, I wanted to clarify some possible ways forward for the HTTP/HTTPS wrt AmiGO and GOlr.

As we transition, unless we do everything all at once (hard), we'll need to walk services over due to web browsers not allowing mixed URL schema. Working our way up from our most primitive services, we want to switch over in such a way that allows APIs making direct calls not to error out. A path forward for any given AmiGO instance would be:

  1. amigo and golr are both http
  2. make the solr/golr part of an amigo instance to return to both http and https calls (two vhosts at the proxy)
    this way https web apps can still use Solr in either mode, and AmiGO can continue on
  3. once we're sure that all needed APIs are available on https, switch the amigo instance over to upgrading https--this requires both a proxy change and a config change in amigo.yaml.
  4. at some point in the future, change the solr to upgrading https

For our purposes, I think we can do 1-3 all at once, but if there is something that we need not available on https, we'd have to stop at 2 until we fixed it.

Does this make sense?

kltm commented 1 year ago

@abessiari TL;DR: golr: http and https; amigo: http or https

kltm commented 1 year ago

@abessiari Working on AmiGO/Noctua services for this; @kltm looking at downloads #1975 . Will circle back to main website afterwards.

kltm commented 1 year ago

Talking to @sierra-moxon @dustine32 @mugitty today. For next steps we wil try:

kltm commented 1 year ago

@sierra-moxon I am trying out the Cloudflare proxy on api.geneontology.org. grill has been removed; the API still seems extant (and no errors are coming in). The HTTPS issue with where the docs are is apparent; good news is that the HTTP version still seems to be there, so current links should work.

kltm commented 1 year ago

Attempted to convert geneontology.org to HTTPS (trying both Cloudflare and GH Pages "native"). Learned a lot, but had to back out. For next time, we'll need to:

Using Cloudflare would be nicer for this, as we would still get some kind of log.

Thank you to @dustine32 and @suzialeksander for helping test during this trial run.

kltm commented 1 year ago

@balhoff, regarding http://geneontology.org/docs/download-ontology I wanted to confirm our decision here.

My goal, IIRC, is that at the top of the page to ensure that anything that is http://purl[...] stays http://purl[...], even if it means removing the link in the text and putting it to the side, correct?

cmungall commented 1 year ago

See also https://github.com/OBOFoundry/OBOFoundry.github.io/issues/2403

kltm commented 1 year ago

@suzialeksander Noting that we have the following rules baked in on our proxy:

        RewriteRule "^/external2go/(.*)" "http://purl.obolibrary.org/obo/go/external2go/$1" [R]
        RewriteRule "^/external2go" "http://purl.obolibrary.org/obo/go/external2go/" [R]
        RewriteRule "^/gene-associations/(.*)" "http://current.geneontology.org/annotations/$1" [R]
        RewriteRule "^/gene-associations" "http://current.geneontology.org/annotations" [R]
        RewriteRule "^/ontology/gene_ontology.obo" "http://purl.obolibrary.org/obo/go/go-basic.obo" [R]
        RewriteRule "^/ontology/(.*)" "http://purl.obolibrary.org/obo/go/$1" [R]
        RewriteRule "^/ontology" "http://purl.obolibrary.org/obo/go/" [R]
        RewriteRule "^/quality_control/annotation_checks/annotation_qc.xml" "https://s3.amazonaws.com/go-public/metadata/annotation_qc.xml" [R]
        RewriteRule "^/doc/GO.xrf_abbs" "http://current.geneontology.org/metadata/GO.xrf_abbs" [R]
        RewriteRule "^/doc/GO.references" "http://current.geneontology.org/metadata/GO.references" [R]
        RewriteRule "^/formats/oboInOwl.owl" "https://raw.githubusercontent.com/geneontology/go-ontology/master/contrib/oboInOwl.owl" [R]
        RewriteRule "^/formats/oboInOwl.obo" "https://raw.githubusercontent.com/geneontology/go-ontology/master/contrib/oboInOwl.obo" [R]
        RewriteRule "^/formats/oboInOwl" "https://raw.githubusercontent.com/geneontology/go-ontology/master/contrib/oboInOwl" [R]
        RewriteRule "^/go-cam/docs/(.*)" "https://geneontology.cloud/docs/$1" [R]
        RewriteRule "^/go-cam/docs" "https://geneontology.cloud/docs" [R]
        RewriteRule "^/go-cam" "https://geneontology.cloud" [R]
        RewriteRule "^/cam" "/go-cam" [R]
        RewriteRule "^/gocam" "/go-cam" [R]
        RewriteRule "^/page/go-mailing-lists" "http://help.geneontology.org" [R]

Some of these will be easy to simulate with GH pages; some maybe not so much. The problematic ones are the purl.obolibrary.org rules and the download rules. I'm now considering backing out until we confirm these one way or another. I believe that letting a lot of these deprecate are the way forward, but not sure how disruptive it will be...

kltm commented 1 year ago

Noting that I don't believe that there is server-style forwarding and redirects possible with GH pages, so we're going to try and simulate as best we can with browser redirects.

suzialeksander commented 1 year ago

Starting a list of unhappy pages:

Remaining forwards that can added from the apache file

kltm commented 1 year ago

Looking at the list above, although I started in on them, I'm beginning to wonder if we would not be better off letting them fail. We do not have the option of doing server-side redirects and the like with GH pages; this is an either-or situation.

tgbugs commented 1 year ago

We import http://geneontology.org/formats/oboInOwl in the NIF-Ontology and not having it resolve breaks the import chain since it is imported in nif_backend https://raw.githubusercontent.com/SciCrunch/NIF-Ontology/master/ttl/nif_backend.ttl.

If this is not fixed it means that all old versions of the NIF-Ontology are dead in the water and cannot be loaded in protege without significant effort from the user. (In the immediate moment it is blocking a data release because we cannot run robot on anything in the NIF-Ontology)

cmungall commented 1 year ago

Hi @tgbugs

We will continue to support both

  1. oboInOwl annotation property URIs
  2. resolving the oboInOwl vocabulary (as a whole, and also hash-based annotation property URIs)

Apologies for the temporary downtime. Some temporary workarounds follow. I am sure you know these already and it's just hard slotting these into the existing workflows, but I'm including for reference for anyone who ends up here:

kltm commented 1 year ago

@suzialeksander @cmungall

After a lot of attempts, I'm not seeing a way forward with GH Pages or Cloudflare for supporting HTTPS and forwards/redirects, without changing DNS provider for the root domain (which causes other devops problems that I'd rather not deal with right now). The fundamental issue is that AWS Route 53 does not allow CNAMEs/aliases for apex domains (although other providers like Cloudflare are less picky about https://datatracker.ietf.org/doc/html/rfc1034).

I'm going to revert to the original GO proxy with the result that we cannot do HTTPS through a third party and must provide it through our own proxy: TODO. This will be a little fiddly, but hopefully doable. As part of this, we'll try and reduce the number of redirects that apache uses to the bare minimum, hopefully making this easier during future migrations. This means that we'll keep the changes that we made so far with this attempt and try and continue with other methods.

stuartmiyasato commented 1 year ago

@kltm At the risk of intruding where I don't really belong (having not really followed the conversation to this point), while Route 53 does not support CNAMEs of domain apex records, they do offer their specific ALIAS record type. Perhaps this does not do what you need, but in case it's useful down the line:

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resource-record-sets-choosing-alias-non-alias.html

(Goes back into hiding...)

kltm commented 1 year ago

@stuartmiyasato You are always welcome on anything that you run across here! :)

I started looking into various alias things that could be done in Route 53, but they mostly seem to be geared towards other AWS systems. The one thing that I was hopeful with ("Another Route 53 record of the same type in the same hosted zone") only allowed pointing at other A records, which kind of defeated what I wanted to do. I started exploring a trick where I would point to an S3 bucket and then setup uniform forwarding from that bucket, but it seemed like it might be too far down the rabbit hole. I may have missed something in there, but fundamentally I need the apex domain to point to a named subdomain in Cloudflare; I've not seen anything non-exotic yet that would support that...

kltm commented 1 year ago

As I don't see an alternative way forward, I'm going to proceed with updating the Ansible playbooks for the GO homepage to support some kind of HTTPS. This will likely be with the wildcard certs from Let's Encrypt that @abdellilah put together, but with a custom one-off grabber and cronjob to keep them up-to-date (which will also allow layering over Cloudflare later on if we choose to do so).

kltm commented 1 year ago

After some frustration with versions, have added wildcard cert getter. Noting that currently this is assumed to be in place and certs are available before the proxies are setup.

kltm commented 1 year ago

This should now be safe on first and all runs.

kltm commented 1 year ago

I now have a testing site available at https://test.geneontology.org. This should be more-or-less identical to what we do when we make the final switch.

kltm commented 1 year ago

@suzialeksander @balhoff At our leisure, we can start testing now with https://test.geneontology.org.

kltm commented 1 year ago

From https://github.com/geneontology/go-site/issues/53#issuecomment-1564993348, no longer need to worry about /go-cam, as we are still using proxy for foreseeable future.

kltm commented 1 year ago

@balhoff An attempt to update ontology URLs https://github.com/geneontology/geneontology.github.io/pull/472

kltm commented 1 year ago

Noting work in progress here: https://github.com/berkeleybop/bbops/issues/30

kltm commented 1 year ago

Talking to @suzialeksander , we'll go ahead with transitioning the home site ASAP.

kltm commented 1 year ago

The GO homepage proxy setup has now transitioned over to HTTPS.

kltm commented 1 year ago

Browsing through the GO site, I believe the only remaining items may be Noctua and models.geneontology.org.

These are now dealt with in https://github.com/geneontology/web-gocam/issues/23#issuecomment-1658819119 and https://github.com/berkeleybop/bbops/issues/12 . Overall, we're done here.

kltm commented 11 months ago

To https://github.com/geneontology/aws-js-s3-explorer.git b0e0fee..4150827 master -> master

https://github.com/geneontology/aws-js-s3-explorer/commit/4150827a841b43f557c5e9f8e174dc77daae5283