biosimulations / deployment

Kubernetes Configuration for BioSimulations
MIT License
3 stars 1 forks source link

Review use of `https` with HSDS by dispatch-service #22

Closed jonrkarr closed 3 years ago

jonrkarr commented 3 years ago

I've needed to change the management of the configuration of HSDS because dev and prod were using the same server, passwords weren't securely set, the configuration used by the dispatch-service was hard coded into a file in the home directory of the HPC account, and there was no clear mechanism to distinguish the internal vs external base paths for the HSDS needed by the dispatch API (http://hsds:80) vs dispatch service (https://data.biosimulations.org). I'm not entirely sure where http vs https needs to be used.

@bilalshaikh42 please review the use of https vs http below.

dev

config/dev/shared.env (used by dispatch-service)

HSDS_EXTERNAL_BASEPATH=https://data.biosimulations.dev

config/dev/hsds/config.yml:

hsds_endpoint: https://data.biosimulations.dev # used for hateos links in response

prod

config/prod/shared.env (used by dispatch-service)

HSDS_EXTERNAL_BASEPATH=https://data.biosimulations.org

config/prod/hsds/config.yml:

hsds_endpoint: https://data.biosimulations.org # used for hateos links in response
bilalshaikh42 commented 3 years ago

Everything that points to the external url for the hsds should be https. (https://data.biosimularions.org)

Using something like http://hsds:80 is happening internal to the kubernetes cluster, so that should be http since we don't have internal tls set up.

jonrkarr commented 3 years ago

Thanks. Previously there was a mixture, so I wasn't sure if http needed to be used somewhere due to lack of a certificate.

bilalshaikh42 commented 3 years ago

I am confused by some of the changes

The S3 url "http://s3low.scality...." can only be used behind the firewall, so setting the external storage url to that can cause problems. Not sure where this variable is being used

jonrkarr commented 3 years ago

The COMBINE API uses the S3 bucket for temporary storage. This has been that way for quite a while.

bilalshaikh42 commented 3 years ago

My confusion is specifically with the commit here: fb4ed9f1d6e846f8da18c986c3b5bad6d9d23ca
in the file config/dev/shared.env

I see that an additional "external" S3 url has been added, but it is the same as the regular one.

I interpret external to mean urls that are being accessed outside of the firewall/ presented to users.

Where is the added external S3 variable need to be used?

jonrkarr commented 3 years ago

Previously you had this hard coded this into the sbatch service. This was used to override the storage endpoint when STORAGE_ENDPOINT=https://localhost:4443.

bilalshaikh42 commented 3 years ago

I see. That situation never arises except for local development, so I did not see a need to add a config for it. I suppose there is no reason not to