OHDSI / WebAPI

OHDSI WebAPI contains all OHDSI services that can be called from OHDSI applications
Apache License 2.0
126 stars 156 forks source link

Memory Usage WebAPI #2358

Open qcaas-nhs-sjt opened 3 months ago

qcaas-nhs-sjt commented 3 months ago

When WebAPI is started on our kubernetes cluster it is currently using 1168Mi of memory when idle whereas Hades is using 75Mi and Atlas is using 4Mi. This makes WebAPI relatively expensive

NAMESPACE             NAME                                               CPU(cores)   MEMORY(bytes)   
cert-manager          cert-manager-6c69f9f796-hx4xb                      1m           22Mi            
cert-manager          cert-manager-cainjector-584f44558c-29cwl           1m           33Mi            
cert-manager          cert-manager-webhook-78466c75fc-frwrv              1m           24Mi            
cert-manager          trust-manager-775bfcf747-nwzxq                     2m           28Mi            
flux-system           helm-controller-5d8d5fc6fd-6pn5r                   2m           93Mi            
flux-system           kustomize-controller-7b7b47f459-9qlvs              24m          43Mi            
flux-system           notification-controller-5bb6647999-4s28v           1m           16Mi            
flux-system           source-controller-7667765cd7-gpz2h                 2m           40Mi            
jupyterhub            continuous-image-puller-lfptt                      0m           0Mi             
jupyterhub            hub-8676597487-b2kkf                               2m           141Mi           
jupyterhub            proxy-7b7b858f6b-xpzsk                             1m           19Mi            
jupyterhub            user-scheduler-648565cdf7-2vb7f                    2m           23Mi            
jupyterhub            user-scheduler-648565cdf7-6qb6j                    3m           29Mi            
keda                  keda-admission-webhooks-6987d68f4c-sd4gh           1m           9Mi             
keda                  keda-operator-5c77476f9b-8vldr                     1m           32Mi            
keda                  keda-operator-metrics-apiserver-7c4675769c-rf5jg   3m           47Mi            
kube-system           calico-kube-controllers-77bd7c5b-gvr2r             1m           25Mi            
kube-system           calico-node-dxcht                                  19m          145Mi           
kube-system           coredns-6d896984d7-svbz4                           2m           23Mi            
kube-system           csi-secrets-store-secrets-store-csi-driver-s5rdz   1m           39Mi            
kube-system           hostpath-provisioner-7df77bc496-g2wl8              1m           17Mi            
metallb-system        controller-5f7bb57799-lmk68                        1m           36Mi            
metallb-system        speaker-m4wwb                                      2m           28Mi            
metrics-server        metrics-server-5dc9dbbd5b-b276n                    2m           32Mi            
nginx                 ingress-nginx-controller-59cf569798-6pmtg          4m           114Mi           
nginx                 ingress-nginx-controller-59cf569798-wv8nz          1m           99Mi            
ohdsi                 ohdsi-atlas-7cd45d4565-5w9rn                       0m           4Mi             
ohdsi                 ohdsi-hades-6f798d67bb-scjlc                       1m           75Mi            
ohdsi                 ohdsi-webapi-76f45574d8-8b2k8                      1m           1168Mi          
secrets-distributor   secrets-distributor-669864d94d-7d62t               0m           64Mi            

I've seen on other threads a recommendation for at least 2GB and I've seen other recommendations suggesting an allocation of up to 16GB of memory to be allocated to the environment which is fine in itself but if the service is using 1GB when idle then I worry how much this could expand when in use, with a single user logged in and using the base source it got up to 1937Mi.

Is there any way in which we could reduce the memory requirements of the service (especially when idle), or offload this requirement onto other services? Are there any magic settings which might help?

And what would the recommendation for right-sizing this service for use at scale in a production environment with multiple data sources?

chrisknoll commented 3 months ago

WebAPI uses a lot of caching, but that's not user-specific. So you might want to see what it looks like when 10 users log in (for example) and it shouldn't expand the memory linearly based on number of users. Of course, as users fetch results there is the intermediate memory usage of fetching results in the service tier, so it will be the case that 10 users doing things exactly simultaneously will cause spikes in the memory usage, but Atlas isn't a high-transaction sort of application.

As far as identifying memory consumption, it would involve some profiling to see where memory is consumed. But the service itself is stateless and so we shouldn't be running into cases where we get into situations with memory leaks or the like (but of course, all software have bugs).

WebAPI isn't a microservice architecture, so there's no 'offloading to other services' that you can do.

qcaas-nhs-sjt commented 3 months ago

WebAPI uses a lot of caching, but that's not user-specific. So you might want to see what it looks like when 10 users log in (for example) and it shouldn't expand the memory linearly based on number of users. Of course, as users fetch results there is the intermediate memory usage of fetching results in the service tier, so it will be the case that 10 users doing things exactly simultaneously will cause spikes in the memory usage, but Atlas isn't a high-transaction sort of application.

As far as identifying memory consumption, it would involve some profiling to see where memory is consumed. But the service itself is stateless and so we shouldn't be running into cases where we get into situations with memory leaks or the like (but of course, all software have bugs).

WebAPI isn't a microservice architecture, so there's no 'offloading to other services' that you can do.

Thanks Chris, I guess my concern is partly the number of users but also the number of different data sources that we might have and the scale of these. In our use case I'm not exactly sure how many datasets we're going to have at any given time but we are likely to have multiple data sources for different various different NHS organisations and these tend to add up quickly in my experience, so even though we are not using. My concern would be around creating a solution that is overly hungry on any particular resource whether it be memory, CPU, disk or otherwise and not really scalable and then try to fix this once we were already at limit rather than addressing up-front.

Thinking out loud here as I've not reviewed the codebase yet to see how caching works, but in theory could we not potentially abstract away the caching service? so for those that want to do so we could use the existing caching methodology in the WebAPI service, but for those of us with more of an enterprise setup we could utilise a caching server such as redis (or similar) and potentially offload that way? While this may seem like moving the problem, it potentially opens up a variety of caching strategies to us that could allow us to better control usage and future proof a scalable solution?

I'm wondering if this could be worth investigating and a potential idea for the next major version of WebAPI?

chrisknoll commented 3 months ago

Caching is definitely something we would want to revisit as part of the WebAPI 3.x update.