akvo / akvo-flow

A data collection and monitoring tool that works anywhere.
http://akvo.org/products/akvoflow/
GNU Affero General Public License v3.0
65 stars 31 forks source link

Dynamically build datapoint names #1576

Closed ichinaski closed 8 years ago

ichinaski commented 8 years ago

Overview

Since form data can be edited in multiple locations, the explicit datapoint name we keep in the datastore is prone to be out of sync, which causes continuous support requests around this.

The solution we have been discussing lately is to dynamically build the name based on flagged questions, ensuring we always have this property up to date.

ichinaski commented 8 years ago

Okay, I'm not sold on this idea yet. I'm highly concerned about losing all the performance boost a denormalized datapoint name in the SurveyedLocale kind offers. Let's put all the ingredients together, so we can make an informed decision on this:

There are 3 locations (as of now) where the name needs to be retrieved:

App API

This is particularly inefficient, for it issues 600+ queries to the datastore per request: For each Datapoint (300 total), we fetch the corresponding SurveyalValue and SurveyInstance entities. Once all the data has been fetched, this endpoint has a negligible performance penalty when building the name dynamically, since it already has access to all the responses and flagged questions.

Dashboard API

Currently issues a single request (based on multiple filters) to the datastore, and performs reasonably well. Dismissing the denormalized identifier from the returned entities will have a dramatic impact here, since it will force us to sequentially query either the QuestionAnswerStore or SurveyalValue entities for each datapoint. On top of this, we need to keep supporting datapoint name searches, that is, filtering results based on a given name. This is currently possible, and fairly performant, with the existing denormalized value. Switching to dynamically built names will be challenging and, most likely, more inefficient that our current approach.

After a chat with @muloem, we listed the steps and queries for this computation: 1) Get the list of SurvyedLocales 2) Get the list of SurveyInstances for these SurveyedLocales 3) Get the list of datapoint-name-flagged questions in the registration form 3) Based on the registration survey ID, the survey instance id, and the flagged questions IDs, fetch the QuestionAnswerStore data 4) Compute the name based on those responses

Reports

Datapoint name is currently fetched from the SurveyInstanceDto, which also holds this information. Since the raw data report servlet already contains SurveyInstances and all the responses, the name could be build from this data. However all 3 approaches discussed here would use different ways for building this name.

ichinaski commented 8 years ago

Test plan

For all tests, you'll need to set up a registration form with several name-flagged questions. Among these, I suggest to have one Cascade and one Option question.

Inspect Data tab changes

The goal is to ensure the datapoint name is rebuilt after changing name-flagged responses in the Inspect Data tab.

Re-publish registration form

The goal is to ensure the datapoint name is reassembled after republishing the registration form. This way we ensure the name-flagged responses are up to date

Raw data reports

The goal is to ensure any change to a raw data report in a registration form ends up reassembling the datapoint name

rumca commented 8 years ago

As per skype discussions earlier, I experienced an intermittent issue whereby I would update responses but the data point name would not be updated as expected. Changes by @ichinaski to add a small wait before re-assembling the data point name seem to have addressed the problem.

All other test plans pass as described :+1:

rumca commented 8 years ago

everything looks good I think :+1: