DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Host cellxgene in Fargate container #1601

Closed melainalegaspi closed 4 years ago

melainalegaspi commented 4 years ago

create new repo in DataBiosphere called cellxgene-fargate

Deployment should be done by Terraform

hannes-ucsc commented 4 years ago

SC deployment: http://cellxgene.singlecell.gi.ucsc.edu/ HCA dev deployment: http://cellxgene.dev.explore.data.humancellatlas.org/

The URLs point at a listing of instances. Each instance has its own subdomain.

The example2 subdomain is for

https://data.humancellatlas.org/release-files/releases/2020-mar/2020-Mar-Atlas-Adult-Retina-10x_annotated_v1.seurat.h5ad

currently the only h5ad file in the March 2020 release AFAIK. The example1 subdomain is for https://cellxgene-example-data.czi.technology/pbmc3k.h5ad.

NoopDog commented 4 years ago

I feel the url needs to be human readable so that when someone is looking at a url they can tell what study it belongs to. We don't need it to be easy to type in, but the easier to read for a human the better.

The use case this is in support of is sharing. If I copy a URL from the address bar and send it to someone (or multiple) it will be very helpful to be able to tell what url goes with which study.

Would also be nice to do this in a predictable way so the URLs could be generated if you know the study name and the base url.

Current urls are:

SCP

https://singlecell.broadinstitute.org/single_cell/study/SCP793/2020-mar-tcell-adult-blood-10x#study-visualize

SCEA

https://www.ebi.ac.uk/gxa/sc/experiments/E-HCAD-8/results/tsne

Xena:

https://singlecell.xenabrowser.net/datapages/?cohort=HCA%20Fetal%20Maternal%20Interface

Would be great if we were as least as readable as the SCP urls.

Would it be possible to do something like

http://cellxgene.data.humancellatlas.org/study-name so like

http://cellxgene.data.humancellatlas.org/2020-Mar-Tcell-Adult-Blood-10x

and then use Cloudfront to map requests to their individual hosts based on the paths?

The individual containers could be referenced like this so you can find them:

https://2020-Mar-Tcell-Adult-Blood-10x.cellxgene.data.humancellatlas.org/

Would the above be doable? How else might we do this?

Cheers, Dave

hannes-ucsc commented 4 years ago

and then use Cloudfront to map requests to their individual hosts based on the paths?

I don't know. Can CF front dynamic websites? We should try. But that should not be the MVP. For now, lets stick with http and user-friendly subdomain.

I can do friendly subdomains as long as the study name only contains letters, digits, underscores or dashes (see here on why). Note that that excludes dots. The only file h5ad I have access does contain dots in its name.

So let me rephrase my question: How do I get from 2020-Mar-Atlas-Adult-Retina-10x_annotated_v1.seurat.h5ad to 2020-Mar-Tcell-Adult-Blood-10x? Should I just drop everything after the first _? Is the result going to be unique for all expected .h5ad files? My apologies if this is obvious somehow, I wasn't part of any of the discussion on file or study names.

hannes-ucsc commented 4 years ago

@NoopDog and I discussed in person and agree to move forward with http (not https), the primary cellxgene domain name with a friendly subdomain using the study name. Apparently, the study name is the part before the first _ in the .h5ad file.

I've implemented these decisions and deployment them to SC and HCA.

SC deployment: http://cellxgene.singlecell.gi.ucsc.edu/ HCA dev deployment: http://cellxgene.dev.explore.data.humancellatlas.org/

NoopDog commented 4 years ago

Hi @hannes-ucsc will the cell x gene links eventually be HTTPS or are we thinking of leaving them as is?

Cheers, Dave

NoopDog commented 4 years ago

Also @hannes-ucsc can we configure the prod instances?

I presume http://2020-mar-atlas-adult-retina-10x.cellxgene.explore.data.humancellatlas.org/ will be the prod url?

Can we link out to urls like that? (with no .dev.)

Cheers and thanks, Dave

hannes-ucsc commented 4 years ago

will the cell x gene links eventually be HTTPS or are we thinking of leaving them as is

Eventually HTTPS. We'll have to decide where to terminate TLS in CF or on the load balancer. Since we want to switch from subdomain to path, which can not be done in the load balancer, termination of SSL would happen for free in CF.

I presume http://2020-mar-atlas-adult-retina-10x.cellxgene.explore.data.humancellatlas.org/ will be the prod url?

Yes.

Also @hannes-ucsc can we configure the prod instances?

@theathorn wanted me to do a separate dev deployment first, since touching prod requires the embargo to be lifted.

Can we link out to urls like that? (with no .dev.)

Not until I create the prod instances.

hannes-ucsc commented 4 years ago

I've created containers for all 23 files in dev.

http://cellxgene.dev.explore.data.humancellatlas.org/

I haven't solved https://github.com/DataBiosphere/cellxgene-fargate/issues/7 completely yet and will be passing that on to @noah-aviel-dove while I focus on the prod deployment. I'll also tear down the SC deployment to save money.