An ELK Stack for OpenMRS community data.
GitHub archive data within Google Big Query are manually extracted using a query like this:
SELECT *
FROM `githubarchive.year.*`
WHERE _TABLE_SUFFIX BETWEEN '2020' AND '2020'
AND (
repo.name IN (
SELECT name FROM `openmrs-github-stats.openmrs_events.openmrs-repos`
)
OR LOWER(repo.name) like '%openmrs%'
OR org.login='openmrs'
)
These GitHub events by year are extracted as newline delimited JSON files, gzipped, and stored in Google Drive.
This repository includes an ELK Stack comprising:
openmrs-repos
table within our BigQuery project are
included. OpenMRS-related work in GitHub outside of the OpenMRS organization within
repositories that do not have "openmrs" in the name and have not been manually
added to the openmrs-repos
table in BigQuery are not included in these stats.
This means, for example, work in Micro Frontend repositories (which chose
to use a naming convention that does not include "openmrs") outside of the
OpenMRS organization (e.g., by other organizations or in personal forks) are not
included unless we manually include them by grepping them from a GitHub
web search (see details here).sysctl -w vm.max_map_count=262144
)NOTE: if you try running this stack and elasticsearch containers are exiting with
error code 137, it is because they are running out of memory. If you want, you can monitor
Docker memory usage from a terminal with docker ps -q | xargs docker stats --no-stream
git clone https://github.com/bmamlin/openmrs-contrib-metrics
)github-data
folder, run ./download-data.sh
to download data filesdocker-compose up -d
Unfortunately, Kibana doesn't provide a way to export/save filters, so you will need to manually introduce the filter to exclude bots. The easiest way to do this is to navigate to the dashboard, remove any parameters from the URL (i.e., anything following a question mark and the question mark), and then paste the following parameters to the end of the URL in the browser's address bar:
?_g=(filters:!(('$state':(store:globalState),meta:(alias:bot,disabled:!f,index:'c81b8f30-9848-11ed-883c-8984dc663080',key:actor.keyword,negate:!t,params:!(openmrs-bot,codecov%5Bbot%5D,dependabot%5Bbot%5D,dependabot-preview%5Bbot%5D,github-actions%5Bbot%5D,codacy-bot,pihinformatics,pull%5Bbot%5D,renovate%5Bbot%5D,transifex-integration%5Bbot%5D,whitesource-bolt-for-github%5Bbot%5D),type:phrases),query:(bool:(minimum_should_match:1,should:!((match_phrase:(actor.keyword:openmrs-bot)),(match_phrase:(actor.keyword:codecov%5Bbot%5D)),(match_phrase:(actor.keyword:dependabot%5Bbot%5D)),(match_phrase:(actor.keyword:dependabot-preview%5Bbot%5D)),(match_phrase:(actor.keyword:github-actions%5Bbot%5D)),(match_phrase:(actor.keyword:codacy-bot)),(match_phrase:(actor.keyword:pihinformatics)),(match_phrase:(actor.keyword:pull%5Bbot%5D)),(match_phrase:(actor.keyword:renovate%5Bbot%5D)),(match_phrase:(actor.keyword:transifex-integration%5Bbot%5D)),(match_phrase:(actor.keyword:whitesource-bolt-for-github%5Bbot%5D))))))),refreshInterval:(pause:!t,value:0),time:(from:now-8y,to:now))
Adding the above gibberish as the parameter portion of the URL should add a NOT bot filter to the view. You want to click on this filter and pin it if it is not already pinned. This will apply the bot filter by default to the dashboard and any visualations are queries you view (since we want to ignore bot activity for most views).
If you are new to git & Docker and are manually installing everything from scratch instead of using a machine pre-configured with git & Docker (like Digital Ocean provides for droplets in its marketplace), reviewing this issue might save you some time.
docker-compose pause
(start back up with docker-compose unpause
)
docker-compose up -d
Shut down the docker containers and clear elasticsearch volumes:
docker-compose down -v
Purge the postgres data:
rm -rf pgdata
If you download new data or make changes to github files, you must clear the
pgdata
subfolder (rm -rf pgdata
) for data to get reloaded. If the pgdata
folder contains any data, postgres will not try to load any new data.
If the stack fails to run, check the logs with:
docker-compose logs
If you see error message or messages of services being unreachable, scroll to where the errors begin. If you see a message like "max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]", then increase virtual memory with:
sysctl -w vm.max_map_count=262144
To make the change permanent, you need to edit /etc/sysctl.conf
and set
vm.max_map_count
to 262144 (from stackoverflow).
If the NOT bot filter is not rendering as expected, it's possible
the index pattern's ID has changed. You may notice a UUID (in the form
c81b8f30-9848-11ed-883c-8984dc663080
in the first part of the "NOT bot"
filter definition. If you navigate to Stack Management > Index Patterns in
Kibana and select the default github*
pattern, the UUID in the address
bar for this index pattern should match the one referenced by the "NOT bot"
filter. If they don't match, copy the UUID of the index pattern and replace
the (like outdated) UUID in the "NOT bot" filter definition when adding the
"NOT bot" filter to the address bar as described above.