I don't know if this necessarily caused any user impact, I happened to notice it when I failed to be able to start a python shell process on the server.
I believe that we were suffering from the situation described in this blogpost, namely that our uwsgi processes had their memory footprints balloon and we were purging these processes way too infrequently.
After applying this change and restarting the uwsgi service, the memory usage dropped (as you could reasonably predict):
Being that the Cadasta Platform is served on machines with relatively low amount of RAM (2GB), it's important that we're pro-active about keeping memory footprints small.
How someone else can test the change
This is hard to test because it relates to the amount of data loaded in the handling of web requests. I'm not sure where the biggest offender for loading data into memory would be for our views. Possibly the /async/locations endpoint? I recommend that we watch the production server for a week or so to ensure that the memory usage stays reasonably low. After restarting uwsgi, the mserver was using only 730MB:
After feeling confident that the memory shortage does not return or before the next release is deployed (as the deployment of that release will undo the changes I manually applied).
Risks
The downside to having a low max-requests is that it's possible that our workers will be reloading themselves too often, causing interruption to service for the end-users. Being we serve few requests (less than 1/sec) and have a pool of 10 uwsgi processes, I don't think this will be an issue.
Follow-up actions
Watch memory usage on server.
Checklist (for reviewing)
General
Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
[ ] Review 1
[ ] Review 2
Is the PR labeled correctly? It should have the migration label if a new migration is added.
[ ] Review 1
[ ] Review 2
Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.
[ ] Review 1
[ ] Review 2
Functionality
Are all requirements met? Compare implemented functionality with the requirements specification.
[ ] Review 1
[ ] Review 2
Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.
[ ] Review 1
[ ] Review 2
Code
Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
[ ] Review 1
[ ] Review 2
Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
[ ] Review 1
[ ] Review 2
Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.
[ ] Review 1
[ ] Review 2
Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.
[ ] Review 1
[ ] Review 2
Has the scalability of this change been evaluated?
[ ] Review 1
[ ] Review 2
Is there a maintenance plan in place?
[ ] Review 1
[ ] Review 2
Tests
Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
[ ] Review 1
[ ] Review 2
If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
[ ] Review 1
[ ] Review 2
If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?
[ ] Review 1
[ ] Review 2
Security
Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
[ ] Review 1
[ ] Review 2
Are all UI and API inputs run through forms or serializers?
[ ] Review 1
[ ] Review 2
Are all external inputs validated and sanitized appropriately?
[ ] Review 1
[ ] Review 2
Does all branching logic have a default case?
[ ] Review 1
[ ] Review 2
Does this solution handle outliers and edge cases gracefully?
[ ] Review 1
[ ] Review 2
Are all external communications secured and restricted to SSL?
[ ] Review 1
[ ] Review 2
Documentation
Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
[ ] Review 1
[ ] Review 2
Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
[ ] Review 1
[ ] Review 2
Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.
Proposed changes in this pull request
Why I made this change
The production server had run dangerously low on memory:
Our uwsgi processes were the biggest offenders:
I don't know if this necessarily caused any user impact, I happened to notice it when I failed to be able to start a python shell process on the server.
I believe that we were suffering from the situation described in this blogpost, namely that our uwsgi processes had their memory footprints balloon and we were purging these processes way too infrequently.
After applying this change and restarting the uwsgi service, the memory usage dropped (as you could reasonably predict):
Description of the change
By dropping the
max-requests
setting to100
, each uwsgi process will reload itself after serving 100 web requests (rather than 5000 requests as previously configured).Being that the Cadasta Platform is served on machines with relatively low amount of RAM (2GB), it's important that we're pro-active about keeping memory footprints small.
How someone else can test the change
This is hard to test because it relates to the amount of data loaded in the handling of web requests. I'm not sure where the biggest offender for loading data into memory would be for our views. Possibly the
/async/locations
endpoint? I recommend that we watch the production server for a week or so to ensure that the memory usage stays reasonably low. After restarting uwsgi, the mserver was using only 730MB:When should this PR be merged
After feeling confident that the memory shortage does not return or before the next release is deployed (as the deployment of that release will undo the changes I manually applied).
Risks
The downside to having a low
max-requests
is that it's possible that our workers will be reloading themselves too often, causing interruption to service for the end-users. Being we serve few requests (less than 1/sec) and have a pool of 10 uwsgi processes, I don't think this will be an issue.Follow-up actions
Watch memory usage on server.
Checklist (for reviewing)
General
Is this PR explained thoroughly? All code changes must be accounted for in the PR description.
Is the PR labeled correctly? It should have the
migration
label if a new migration is added.Is the risk level assessment sufficient? The risks section should contain all risks that might be introduced with the PR and which actions we need to take to mitigate these risks. Possible risks are database migrations, new libraries that need to be installed or changes to deployment scripts.
Functionality
Are all requirements met? Compare implemented functionality with the requirements specification.
Does the UI work as expected? There should be no Javascript errors in the console; all resources should load. There should be no unexpected errors. Deliberately try to break the feature to find out if there are corner cases that are not handled.
Code
Do you fully understand the introduced changes to the code? If not ask for clarification, it might uncover ways to solve a problem in a more elegant and efficient way.
Does the PR introduce any inefficient database requests? Use the debug server to check for duplicate requests.
Are all necessary strings marked for translation? All strings that are exposed to users via the UI must be marked for translation.
Is the code documented sufficiently? Large and complex classes, functions or methods must be annotated with comments following our code-style guidelines.
Has the scalability of this change been evaluated?
Is there a maintenance plan in place?
Tests
Are there sufficient test cases? Ensure that all components are tested individually; models, forms, and serializers should be tested in isolation even if a test for a view covers these components.
If this is a bug fix, are tests for the issue in place? There must be a test case for the bug to ensure the issue won’t regress. Make sure that the tests break without the new code to fix the issue.
If this is a new feature or a significant change to an existing feature? has the manual testing spreadsheet been updated with instructions for manual testing?
Security
Confirm this PR doesn't commit any keys, passwords, tokens, usernames, or other secrets.
Are all UI and API inputs run through forms or serializers?
Are all external inputs validated and sanitized appropriately?
Does all branching logic have a default case?
Does this solution handle outliers and edge cases gracefully?
Are all external communications secured and restricted to SSL?
Documentation
Are changes to the UI documented in the platform docs? If this PR introduces new platform site functionality or changes existing ones, the changes must be documented in the Cadasta Platform Documentation.
Are changes to the API documented in the API docs? If this PR introduces new API functionality or changes existing ones, the changes must be documented in the API docs.
Are reusable components documented? If this PR introduces components that are relevant to other developers (for instance a mixin for a view or a generic form) they should be documented in the Wiki.