DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
6 stars 2 forks source link

Move anvilprod to anvilproject.org domain #5284

Closed hannes-ucsc closed 3 months ago

hannes-ucsc commented 1 year ago

To skip the discussion and jump to the checklist with specific action items, click here.

Move Azul anvilprod from prod.anvil.gi.ucsc.edu to anvilproject.org or a subdomain thereof. Move the portal and browser to anvilproject.org.

I am merely assuming that this will need to be done, and am planning for it. Leadership needs make the call here.

[edit @dsotirho-ucsc] https://github.com/DataBiosphere/azul/issues/5284#issuecomment-1744063248

hannes-ucsc commented 1 year ago

@bvizzier-ucsc it is critical that we get clarity here as soon as possible.

bvizzier-ucsc commented 1 year ago

The AnVIL Portal and AnVIL Dataset Catalog will be moved to a completely separate AWS deployment, thus removing them from the security boundary. They will continue to be hosted on anvilproject.org. This has the side benefit of removing the management and updating of this functionality from the Azul team.

The Data Explorer (Data Browser + Azul functionality) will be deployed to a subdomain of anvilproject.org. The suggested name is explore.anvilproject.org. This subdomain will be under the oversight of the Azul team.

Since the AnVIL Portal and AnVIL Dataset Catalog are anonymous use content, do not have access to managed access data, and do not share information with the Data Explorer, this is considered low risk from a security standpoint.

bvizzier-ucsc commented 1 year ago

HCA will have a similar change with the Data Explorer on explore.data.humanecellatlas.org.

hannes-ucsc commented 1 year ago

We can't just redraw the authorization boundary and pretend that we've maintained a compliant system. The system architecture that was assessed by the 3PAO, that was recommended for ATO by the 3PAO, and that we received ATO for by the AO, places the portal component firmly within the authorization boundary. To the user, the browser and portal appear as a tightly integrated, unified interface. Internally, they use similar deployment methods (GitLab) and are hosted on the same infrastructure (S3 + CloudFront). It was the most natural architectural decision to collocate them within the boundary.

Any change to the approved architecture needs to be signed off by the AO, and needs to be be backed by technically sounds reasoning. The PI is of course free to override me on this, but I won't deviate from the approved security architecture without the PI going on the record with that decision, here in this issue.

Regarding the technical reasoning behind deviating from the approved architecture, we should consider three implementation alternatives:

1) The portal, the browser frontend and the Azul backend are hosted on the same domain (anvilproject.org) and operated by UCSC within the security boundary.

2) The portal is hosted on anvilproject.org while browser and backend are hosted on a subdomain of anvilproject.org, say, data.anvilproject.org.

3) The portal is hosted on anvilproject.org while browser and backend are hosted on a separate domain, say, anvil.gi.ucsc.edu.

What @bvizzier-ucsc describes above is alternative 2.

I am leaning towards alternative 1 because it matches the approved architecture and we have already implemented it. Some minor aspects like establishing redirects from domains for EoL AnVIL projects (like CESR) would still have to be implemented but they are not complicated to do, once we operate anvilproject.org. In conversation, @bvizzier-ucsc claimed that there was more to operating anvilproject.org but aside from the aforementioned redirects, was not able provide anything of substance to back that up. He agreed to ask the responsible parties and get back to us with details. Until I hear what exactly it is that I am missing, I will stand by my claim that we're 95% done with implementing alternative 1.

Alternative 2 opens up security concerns. In the JavaScript security model, there is less separation between a subdomain and its parent domain, than there is between two completely different domains. Furthermore, TLS certificates can include subdomains, and cookies can be shared by domain and subdomain. If we go this route, we need to make sure that a compromised portal on anvilproject.org does not also compromise the browser and Azul backend hosted at the subdomain. A compromised portal would be more likely in alternative 2 because it would not be subject to the hardening that full compliance provides in alternative 1.

Alternative 3 creates a jarring user experience because navigating between browser and portal would cause a change of the domain name shown in the address bar of the web browser.

Alternatives 2 and 3 also completely decouple the software lifecycle and deployment schedule of the browser and backend from that of the portal. In alternative 1 we have the ability to deploy all three at the same time. This is especially useful when we need to deploy codependent changes like to the look and feel of the navigation bar at the top. In the other two alternatives, deploying these codependent changes requires extra coordination, something we're not particularly good at, as we recently proved with the botched release of dcp29 for HCA.

To summarize, while I am not strongly opposed to alternatives 2 and 3, I would need to be given that directive by the PI along with an assurance that they are aware of the operational and security caveats described abvove and we would need to seek approval by the AO. There is also an AI on @bvizzier-ucsc to provide details on what other operational duties are part of operating anvilproject.org.

NoopDog commented 1 year ago

Thanks for the comments @hannes-ucsc

I have a few responses below.

Any change to the approved architecture needs to be signed off by the AO,

Note that Nneka has discussed this with Sarah T and has reported back that this change would not require any further approvals, so it seems we are ok here.

Alternative 3 creates a jarring user experience because navigating between browser and portal would cause a change of the domain name shown in the address bar of the web browser

I don't believe the user experience would be so jarring or even that noticeable. This is a common pattern in this space. For example:

We could go with something like https://anvilexplorer.org, and I feel the transition between anvilproject.org and anvilexplorer.org would still be relatively seamless as we have control over the UI of both sites and can communicate that the sites are different but belong to the same family.

This approach also gives UCSC the most control over the networking setup, isolates the managed access Explorer from the rest of the AnVIL ecosystem, and avoids the discussion of who owns/controls the anvilproject.org domain.

While I believe stakeholders prefer alternative 2, we have discussed alternative 3 with them previously, and I think they would be ok with this option.

Alternatives 2 and 3 also completely decouple the software lifecycle and deployment schedule of the browser and backend from that of the portal.

To me, this is a goal, not a problem. The Data Explorer front-end remains in the security boundary, and its deployment can be synchronized as needed with changes in the Azul back end.

The AnVIL portal and dataset catalogs currently have no dependency on the Azul back end, so there is no synchronization to try to optimize.

The AnVIL and NCPI dataset catalogs could eventually have a dependency on calling the Azul non-authenticated /datasets endpoint for an open-access view of datasets, but this would be called at build time, not at run time, so the Azul API could change without breaking the dataset catalog deployments.

Alternative 2 opens up security concerns.

Yes, but I believe (and am happy to be corrected) that these security concerns are addressable, and work to address similar concerns needs to be done in any case. To address the security concerns:

We will surely discover other headers we want to set as this work proceeds, but similar work needs to be done in any case regardless of the approach and should mitigate the security concerns arising from pursuing alternative 2.

Summary

I feel it would be in our best interest to pursue alternative 2 - (deploying the Explorer to explore.anvilproject.org) during the next testing phase now that we have achieved ATO and fall back to alternative 3 (deploying the Explorer on anvilexplorer.org) in case we discover any problems implementing alternative 2. I feel this gives the best user experience and paves the path to deploying other applications on subdomains of anvilproject.org if we are successful, and it still improves security and provides a reasonable experience if we need to fall back to alternative 3.

hannes-ucsc commented 1 year ago

Any change to the approved architecture needs to be signed off by the AO,

Note that Nneka has discussed this with Sarah T and has reported back that this change would not require any further approvals, so it seems we are ok here.

Noted, I just don't know how relevant that is. I don't think we will be able to move forward with such a significant architectural deviation from the documented, assessed and approved architecture without proper documentation, review and approval.

We have already implemented alternative 1 and it has been assessed and approved. Without substantial evidence as to why we need a more complicated solution that requires 1) more work to keep secure, and 2) more elaborate coordination between teams, I highly doubt the other alternatives are in our best interest.

NoopDog commented 1 year ago

1) more work to keep secure,

I feel it's less work to keep secure and focuses our security work on the Explorer. We have to keep the Explorer secure in any case, and we avoid the work of having to keep the NCPI dataset catalog, the AnVIL dataset catalog, and the AnVIL consortia pages to the higher FedRAMP moderate standard, reviewing every change.

Instead of pulling all applications into the security boundary, the security approach here is to focus on hardening Explorer and isolating it from the other applications. We want to avoid making Explorer security depend on the security of the other non-managed-access applications.

Pulling the non-managed-access applications out of the security domain avoids extra, unnecessary security work and process overhead.

2) more elaborate coordination between teams

I feel it's far less coordination between teams. There is no need, as noted above, to coordinate changes to the other applications as they have no dependencies on Explorer, and as they are outside of the security boundary, we can avoid making the Azul team a bottleneck for reviewing changes. This separation of concerns and making the teams more independent is one of the key goals of this effort.

hannes-ucsc commented 1 year ago

Note: What some comments here call "Explorer" is what the relevant FedRAMP documentation refers to as the "Data Browser". I will use the official term.

Cookies - the Explorer currently sets no cookies

That's a bug, not a feature. It causes the Data Browser to lose the logged in status when a user refreshes the page after they login, forcing them to log in again.

Most other measures you list are general best-practices we already implemented, or will be implementing, regardless of what will be decided here. These types of things have been flagged by the web application security scanner that compliance mandates we run against the portal, browser and service. Who will be responsible for scanning the portal at the required cadence, if the authorization boundary is redrawn to exclude the portal? Who will handle the findings?

There is still increased risk of a cross-domain exploit between domain and subdomain in alternative 2. One aspect of compliance is that it establishes not only concrete technical measures that protect against currently known exploits, but also monitoring and auditing measures that protect against future unknown ones.

Alternative 3 creates a jarring user experience because navigating between browser and portal would cause a change of the domain name shown in the address bar of the web browser

I don't believe the user experience would be so jarring or even that noticeable. This is a common pattern in this space. For example:

Dockstore is located on https://dockstore.org, Terra is located on https://anvil.terra.bio, Gen3 (the current data explorer) is located on https://gen3.theanvil.io/

These are different sites from the portal. The portal and the browser look and feel (for good reason) as one site. As a litmus test, consider what would happen if the links were opened in a new tab. That would be completely fine for links from the portal to Dockstore, Terra or Gen3. It would break the user experience if that were to happen for the link from the portal to the browser, and the one pointing back from the browser to the portal.

There is no need, as noted above, to coordinate changes to the other applications as they have no dependencies on Explorer

You appear ignore the example I gave above. Since browser and portal share important page elements, they are dependent on each other, and changes to those elements should be deployed together. Ensuring that this happens is easiest if it the deployment duty is owned by a single organisation.

we can avoid making the Azul team a bottleneck for reviewing changes

Don't worry. We have the browser and portal deployment process completely automated on GitLab and have a well-oiled SDLC for reviewing and deploying changes. We will only need to review browser and portal changes with respect to compliance and will be able to fast-track many of them. Backend (Azul) changes receive more scrutiny simply because my team is implementing them, not just deploying them.

NoopDog commented 1 year ago

My previous comments may have been a bit misleading. I'll go ahead and summarize more succinctly.

Browsers implement the Same-Origin Policy that restricts applications from accessing resources from other origins.

An origin is a combination of protocol, host, and port.

https://explore.anvilproject.com and https://anvilproject.com are considered different hosts and, thus, different origins because of the subdomain. By default, these different origins are prevented from accessing each other's content (with exceptions for images that can be addressed with a Content Security Policy).

Because we currently have placed the Explorer (Data Browser), catalogs, and portal on the same host (https://anvilproject.org), the catalogs and portal, and any other application that we would like to host on anvilproject.org in the future, will be able to access the DOM of the Explorer in certain circumstances. This is a security issue that needs to be addressed by placing Explorer on its own origin (https://explore.anvilproject.com).

Hosting both the portal and catalogs on the same origin as Explorer (https://anvilproject.org) prevents browsers from using the Same-Origin policy to restrict access between the applications and forces us to maintain an unnecessary, eternal vigilance on non-managed access applications (the portal and catalogs).

Instead, we need to make Explorer's security stand on its own by placing the Explorer on its own origin (https://explore.anvilproject.com). Once we do this, browsers will restrict access to the Explorer DOM to only mechanisms where Explorer needs to opt in explicitly.

While there are other headers and optimizations to make, placing Explorer on its own subdomain will be a core aspect of securing the Explorer front end.

NoopDog commented 1 year ago

Explorer UX - Transitioning to a subdomain

Regarding the UX of the transition between the anvilproject.org and explore.anvilproject.org. There is a concern that the user experience will be "jarring" by transitioning to a subdomain. I have setup the Clever Canary dev site at https://anvilproject.dev.clevercanary.com/ to have its "Datasets" link in the header load https://explore.anvilproject.org/explore/datasets. If you try this out, you will find the transition is quite smooth.

The reverse concern was also raised, that the sites look too similar, and the user will not know they have transitioned to a managed access site (even though they have to accept terms and login).

To address this, we will modify the Datasets drop-down to indicate that the user is leaving for a separate site with the standard indicator.

image

We will also update the header in the AnVIL Explorer to:

  1. Show only Explorer-specific links to guides and ToS, metadata schema, etc.
  2. Add an Explorer "subsite indicator" in the header replacing the slogan.

This has the advantage of locating the ToS more directly with the Explorer App so that it is clear what asset is being addressed by the ToS.

image

The transition between the sites will then be smooth but not too smooth.

NoopDog commented 1 year ago

Security Headers Implementation

On the https://explore.anvilproject.org/explore/datasets site, we have made a first pass at implementing the security headers and CSP to relax constraints imposed by the Single-Origin policy to :

The CSP needs additional review and further tightening to implement additional best practices but these updates are required no matter what domain the AnVIL Explorer is hosted on.

Additional headers will also be added but again these need to be updated no matter what domain the Explorer is on.

With the headers/CSP added so far, the site still works as does the Google login. Note that the Clever Canary client Id is not yet whitelisted with Broad for the dev environments so while Google login works, the granted token is not able to access the mock managed-access data. This will be corrected shortly.

image
hannes-ucsc commented 1 year ago

Cookies - the Explorer currently sets no cookies

That's a bug, not a feature. It causes the Data Browser to lose the logged in status when a user refreshes the page after they login, forcing them to log in again.

Note that the login status is also lost when clicking on a different item in the main menu at the top.

hannes-ucsc commented 1 year ago

I appreciate the quick turn-around on the prototype. We should be careful not to set up too many different prototype sites, so as to not confuse leadership and the user community. This is especially a concern when it comes to hosting development prototypes on production domain, as that is what appears to have been done here.

And yes, I have no doubt that Clever Canary can address current security concerns by implementing changes and deploying them. Part of compliance, though—the predominant part, actually—is to ensure that we don't regress and that future concerns are addressed as well. We decided early on that Clever Canary would not seek compliance, and that only UCSC would. This decision had obvious advantages, but it comes at a price: Clever Canary is not authorized to operate infrastructure within the authorization boundary. We can redraw the boundary but that would require approval by AO and PI. UCSC could be held liable if they moved forward without those approvals.

On the flip-side: The obvious and technically straight-forward approach is to host portal and browser on one domain. The distinction between the two is entirely technical, to the user it is irrelevant. The only substantial concern with this solution I have heard so far is the worry that UCSC could be a bottleneck when deploying Clever Canary's contributions to the portal. If that's the only concern, I'm sure it can be put to rest. We review and promote changes to HCA production every week. I think a similar cadence is more than sufficient for AnVIL.

bvizzier-ucsc commented 1 year ago

Benedict has approved the separation of the Explorer from the Portal and Dataset Catalog.

benedictpaten commented 1 year ago

As PI, I'm going on the record to approve the separation of the Explorer from the Portal and Dataset Catalog. I recognize the legitimate arguments on both sides of this and have read everyone's input carefully. I further recognize the issues raised by @hannes-ucsc as legitimate concerns that we will, on balance, choose to disregard by this decision. In the future I expect the Explorer (aka Data Browser) to become an increasingly standalone utility that will serve data from AnVIL and potentially other Terra and NCPI sources. Keeping it separate therefore seems like a sensible path moving forward.

hannes-ucsc commented 1 year ago

We discussed this in stand-up today and @nolunwa-ucsc volunteered to seek AO approval for redrawing the authorization boundary to exclude the portal part of the system.

We also need to decide if we want to go with a subdomain for the browser part, or a standalone domain. As I noted above, this has security implications for the handling of cookies.

nolunwa-ucsc commented 1 year ago

Emailed David B and Sarah T about this and response from Sarah was "I will take a look. Bernick is out of the office the rest of this week, but we will discuss early next week and get back to you with feedback"

https://docs.google.com/document/d/1P2Ez-ZjzkRC1LZpKfLH_wOPwfHFx-m38vTY_fKDD6Ww/edit

nolunwa-ucsc commented 1 year ago

@hannes-ucsc @NoopDog Question from David B: "My core question here is: what data/users flow through these systems that are asked to be excused? Do they handle any controlled access data or metadata OR do users sign into them (and handle user tokens/credentials)" FYI

hannes-ucsc commented 1 year ago

Since anvilprod—the actual deployment in question here, and the one we were granted ATO for—is now public, the best way to answer that question is as follows:

Got to https://prod.anvil.gi.ucsc.edu/explore/datasets

If you click on Datasets at the top navigation bar, you're visiting the part that is to remain within the authorization boundary. If you click on any other link in that bar, you're visiting the part that David is being asked to be excused.

NoopDog commented 1 year ago

Hi @hannes-ucsc, regarding:

We also need to decide if we want to go with a subdomain for the browser part, or a standalone domain. As I noted above, this has security implications for the handling of cookies.

Could you identify the security implications for the handling of cookies that you are concerned about?

While we do not currently set cookies, we do want to be able to set a session or other cookie in the future if this becomes a design choice.

In researching this, I do not yet see any issues with deploying, as Terra does, the marketing/project site on https://terra.com and the secure application on https://app.terra.com.

For AnVIL, I believe it is a stakeholder preference to deploy on https://explore.anvilproject.org, and for HCA, I do not believe we would be able to talk them into deploying on a domain that is not a subdomain of humancellatlas.org as you know they are very brand conscious.

Also, as users, we are taught to ensure we are on a proper root domain like https://secure06ea.chase.com/ rather than https://secure-chase.com, for example, to avoid phishing attacks, etc., so I feel there is a user confidence and security benefit to hosting the Explorer on a subdomain of the project site rather than a standalone domain like https://anvilexplorer.org.

I mention the above to provide motivation for working together to deploy safely on the https://explore.anvilproject.org and https://explore.data.humancellatlas.org subdomains.

Creating a Secure Cookie

From my research, it seems we could create a suitably secure session cookie as follows:

Set-Cookie: cookieName=cookieValue;
Domain=service.explore.anvilproject.org; 
Secure; 
HttpOnly;
SameSite=None;
Max-Age=900;
  1. The server sets the cookie, perhaps like Terra does with a call to a /setCookie so that it can be made HttpOnly.
  2. The cookie is marked HttpOnly so that it can not be accessed or modified by client-side scripts.
  3. The cookie is marked Secure so that it can only be sent over https.
  4. The Domain of the cookie is scoped so that it is only sent to service.explore.anvilproject.org or subdomains of this URL, such as sub1.service.explore.anvilproject.org
  5. SameSite=None allows the cookie to be sent on the cross-domain request (from explore.anvilproject.org to service.explore.anvilproject.org)
  6. Max-Age=900 sets the cookie's lifetime as 900 seconds (15 minutes) or as appropriate to provide for session inactivity timeout.

We, of course, will also need to set additional headers to prevent the site from being loaded in an iframe, set a content security policy that allows explore.anvilproject.org to send requests to service.explore.anvilproject.org, etc., and conform to other best practices.

Do you see any security-related reasons why we would not be able to set a secure session that enabled explore.anvilproject.org to send this cookie to service.explore.anvilproject.org?

If so, how would this be less secure than a cookie sent from, say, anvilexplorer.org to service.anvilexplorer.org, and how does this additional risk compare with the benefit derived from having the Explorer on the same domain as the project/marketing sites?

Thanks for taking a look at this. Cheers, D

benedictpaten commented 1 year ago

Chatting to David B, sounds like we are fine to take this out of security boundary. The major risk we now have is not delivering in a timely manner. Let's proceed.

hannes-ucsc commented 1 year ago

This change to the assessed and approved security architecture introduces risks that I am not willing to shoulder. I have given my reasoning above. I insist that we get official, on the record, approval by the AO.

hannes-ucsc commented 1 year ago

Regarding the schedule risk: please note for how long this ticket lay dormant. I reminded my PM verbally multiple times and in writing here.

bvizzier-ucsc commented 1 year ago

Before moving to anvilproject.org, we should bring up the full anvilproject.org site functionality on prod.anvil.gi.ucsc.edu before switching to anviproject.org so we can test the site's functionality.

Please see https://ucsc-gi.slack.com/archives/C705Y6G9Z/p1690934786445149

bvizzier-ucsc commented 1 year ago

Regarding the schedule risk: please note for how long this ticket lay dormant. I reminded my PM verbally multiple times and in writing here.

A significant portion of the delay was discussing options and then escalating the approach as requested.

nolunwa-ucsc commented 10 months ago

assignee to determine next steps for implementation

hannes-ucsc commented 10 months ago

Note: I'm assuming that we want the Data Browser to be hosted at the explore.anvilproject.org subdomain

~I'm sure there is portal work. CC to specify the work and propose where it fits in the above timeline.~

dsotirho-ucsc commented 10 months ago

@hannes-ucsc: "@bvizzier-ucsc to do first task list item and @hannes-ucsc to get started on the remaining items assigned to him."

bvizzier-ucsc commented 10 months ago

@hannes-ucsc: "@Noopdog to list necessary portal work along with ticket references. I will then update comment above to include that work."

NoopDog commented 10 months ago

@bvizzier-ucsc @hannes-ucsc I added the tasks above required to make the Explorer accessible from the portal once the Explorer is deployed to the subdomain.

achave11-ucsc commented 10 months ago

@hannes-ucsc to review CC's additions.

bvizzier-ucsc commented 9 months ago

The person responsible for managing the domain name resolution has been put in contact with the Azul team and is waiting on the relevant addresses and domain names.

I consider the first check list item, assigned to me, to be resolved.

hannes-ucsc commented 7 months ago

The explore.anvilproject.org subdomain is functional. The next checklist item is on CC. @bvizzier-ucsc to coordinate.

hannes-ucsc commented 7 months ago

There are also CC checklist items that point to closed issues. @bvizzier-ucsc to confirm completion and check those items.

bvizzier-ucsc commented 6 months ago

@NoopDog committed to providing an update to the checklist yesterday.

NoopDog commented 6 months ago

@bvizzier-ucsc @hannes-ucsc I updated the checklist above with a few items related to updating the CloudFront configuration to support the stand-alone domain and removing the /explore basePath on all URLs.

I also marked the header work as done.

image

These header updates are visible when using a feature flag. As a final update, we will remove the requirement for the feature flag to be set to view the explore links in the header menus.

My assumption in the ordering of the tasks above is that we will update the CloudFront/Deployment environment first and test on https://prod.anvil.gi.ucsc.edu/datasets before updating the domain name.

bvizzier-ucsc commented 6 months ago

Looking to confirm Dave's assumption above.

achave11-ucsc commented 6 months ago

@hannes-ucsc: "I confirm the assumption. The next open checklist item is for Clever Canary. Assigning to them."

NoopDog commented 6 months ago

Hi @hannes-ucsc can you take a look at https://github.com/DataBiosphere/data-browser/issues/3805 and let me know if you have any questions on what it will take to setup AnVIL Explorer on its own subdomain? Thanks! D

hannes-ucsc commented 6 months ago

No questions.

bvizzier-ucsc commented 6 months ago

@hannes-ucsc Can the checklist be updated? It looks like https://github.com/DataBiosphere/data-browser/issues/3805 can be checked.

hannes-ucsc commented 6 months ago

Yes. I updated it just now and added the blockers of this issue as dedicated items.

hannes-ucsc commented 6 months ago

My team is ready to make the move. Where is CC on https://github.com/DataBiosphere/data-browser/issues/3849?

NoopDog commented 6 months ago

Hi, @hannes-ucsc, we just merged the update for https://github.com/DataBiosphere/data-browser/issues/3849. Let me know how it goes.

Cheers, D

hannes-ucsc commented 6 months ago

The domain name change will be obvious.