IDR / deployment

Deployment infrastructure for the Image Data Resource
https://idr.openmicroscopy.org/about/deployment.html
BSD 2-Clause "Simplified" License
13 stars 14 forks source link

prod107: search engine deployment #370

Closed sbesson closed 2 years ago

sbesson commented 2 years ago

The deployment of a first version of the search engine stack is a target of the upcoming prod107 release. #359 introduces the playbook allowing to deploy the stack while #367 contains the logic to define a new group of servers where the service should be deployed.

While #367 focuses on deploying the service in the simple context of a pilot VM where all the services are colocated on a single nde, for the scope of prod107 we will need to decide how to deploy it the multi-nodes architecture used for production deployments.

The current set of instances (and their relationship) created for each deployment can be loosely summarized by:

idr-database -> idr-omeroreadonly-1,idr-omeroreadonly-2,idr-omeroreadonly-3,idr-omeroreadonly-4, idr-omeroreadwrite -> idr-proxy idr-database,idr-omeroreadonly-1,idr-omeroreadonly-2,idr-omeroreadonly-3,idr-omeroreadonly-4, idr-omeroreadwrite ->idr-management

Listing the various architectures available

Option 1: deploy the app in the management instance

Pros: it benefits from the Docker prerequisites being installed in the management instance (Currently used for monitoring), it's the strategy originally used for #359, currently deployed on test104 Cons: the compute capacity of the management VM is limited esp. for a full indexing, consuming the search endpoint from the omero nodes would require to go through the proxy

Option 2: deploy the app in the omeroreadwrite instance

Pros: this is a more scaled version of the option 1, it uses the fact the omeroreadwrite server has a larger compute capacity. Additionally, this makes use of the capacity of omeroreadwrite which is currently unused when the deployment is moving to production (except for minor DB updates like adding DOIs/publication) Cons: same as above, consuming the search endpoints from the omeroreadonly nodes currently requires to go through the proxy without additional nginx configuration

Option 3: deploy the app in a new searchengine instance

Pros: allows to tailor compute/storage capacity of the instance to the exact needs of the app. Allows the various omero instances to access the searchengine service in the same way as the database is accessed Cons: requires 1 more instance would be created per production deployment and probably needs to be reviewed with the global tenancy capacity

Option 4: deploy the app across all omero instances

Pros: for indexing, this would keep the benefit of option 2 and use the compute capacity of omeroreadwrite, if we are thinking of integrating with omero-web or idr-gallery, it colocates the service and simplifies. Also this starts scaling the service in the same way as the OMERO.web servers Cons: probably requires additional thoughts on how to distribute the data especially the elasticsearch database, possible moving towards an ElasticSearch cluster (4a) or moving ElasticSearch to yet another instance (option 4b)

khaledk2 commented 2 years ago

I think it may be a good idea to start with option 3 and deploy it in a new instance. This will give us the opportunity to deploy, configure, and maybe reconfigure the instance and the apps without affecting anything else.

sbesson commented 2 years ago

@khaledk2 coming back to this, a few outstanding questions:

khaledk2 commented 2 years ago

@khaledk2 coming back to this, a few outstanding questions:

  • based on your latest investigation of indexing, what would you recommend for the compute capacity of a standalone searchengine VM? 16VCPUs/64GB RAM like omeroreadwrite or 8 CPUS/ 32GB RAM like omeroreadonly?

It would be good to have a VM like pilot-idr0000-omeroreadwrite (16VCPUs/64GB RAM).

  • what should be the typical size of the underlying data volume? And should this volume follow the same snapshotting/cloning lifecyle as the DB/binary repository/nginx cache?

A data volume of 50 to 100 GB should be fine (preferably SSD). Yes, I think this should be fine. I will test getting the elastic search indices from a disk copy,

  • are we happy recreating an idr-testing deployment from scratch with the initial set of choices? @will-moore

Yes, I think it should be fine.