There is a need for getting accurate counts for option question answers. The Akvo Flow dashboard uses the SurveyQuestionSummary kind for generating the charts in the Data > Charts tab.
The code that handles those counts has not been touched in 6 years, the last commit was in 2017: 75424d1
We're not confident that this code handles form instance deletion properly, as we have found several entities in the Datastore with no matching form instance.
Alternatives
Unilog - Option 1
We know that reading data from the Datastore is expensive, we could try to leverage the sync'ed data in the unilog.
Pros: The Flow API already has a connection to the proper postgres database for the /sync API
Cons: We need to implement and store a materialized version of those counts.
Notes:
There is already some code doing that job (See: event_log.clj)
That code needs to be copied and cleaned
We have another copy of the Datastore values in postgres
Unilog - Option 2
Instead of materializing the counts, we can try make the counts via SQL queries.
Pros: We have a connection to the database
Cons: Because we don't have the proper indices there will lots of sequential scanning of the event_log table
Notes:
The code will be a pseudo-consumer, since we need to take into account form instance deletion events to discard orphan answers (We're not confident that deleting form instances leads to deleting all related data)
We need to present the same data as the one in Raw Data Report
Remote API
Read the data (QuestionAnswerStore) directly via Remote API and make the counts in the server
Pros: We're doing that for all the rest of data and survey definition
Cons: We'll be adding a new endpoint that uses an expensive operation, instead of using a database outside GAE
Notes:
The queries can be done as efficient as possible (e.g. use keys-only queries when possible)
The endpoint supports getting counts for OPTION questions, and basic statistics values for NUMBER questions (mean, max, min, standard-deviation, count)
Data authorization remains the same. A user must have access to the Survey to be able to get question stats
Context
There is a need for getting accurate counts for option question answers. The Akvo Flow dashboard uses the SurveyQuestionSummary kind for generating the charts in the
Data
>Charts
tab.The code that handles those counts has not been touched in 6 years, the last commit was in 2017: 75424d1
We're not confident that this code handles form instance deletion properly, as we have found several entities in the Datastore with no matching form instance.
Alternatives
Unilog - Option 1
We know that reading data from the Datastore is expensive, we could try to leverage the sync'ed data in the unilog.
/sync
APIUnilog - Option 2
Instead of materializing the counts, we can try make the counts via SQL queries.
event_log
tableRemote API
Read the data (
QuestionAnswerStore
) directly via Remote API and make the counts in the server