Closed ichinaski closed 8 years ago
Okay, I'm not sold on this idea yet. I'm highly concerned about losing all the performance boost a denormalized datapoint name in the SurveyedLocale
kind offers. Let's put all the ingredients together, so we can make an informed decision on this:
There are 3 locations (as of now) where the name needs to be retrieved:
This is particularly inefficient, for it issues 600+ queries to the datastore per request: For each Datapoint (300 total), we fetch the corresponding SurveyalValue
and SurveyInstance
entities. Once all the data has been fetched, this endpoint has a negligible performance penalty when building the name dynamically, since it already has access to all the responses and flagged questions.
Currently issues a single request (based on multiple filters) to the datastore, and performs reasonably well. Dismissing the denormalized identifier from the returned entities will have a dramatic impact here, since it will force us to sequentially query either the QuestionAnswerStore
or SurveyalValue
entities for each datapoint. On top of this, we need to keep supporting datapoint name searches, that is, filtering results based on a given name. This is currently possible, and fairly performant, with the existing denormalized value. Switching to dynamically built names will be challenging and, most likely, more inefficient that our current approach.
After a chat with @muloem, we listed the steps and queries for this computation: 1) Get the list of SurvyedLocales 2) Get the list of SurveyInstances for these SurveyedLocales 3) Get the list of datapoint-name-flagged questions in the registration form 3) Based on the registration survey ID, the survey instance id, and the flagged questions IDs, fetch the QuestionAnswerStore data 4) Compute the name based on those responses
Datapoint name is currently fetched from the SurveyInstanceDto
, which also holds this information. Since the raw data report servlet already contains SurveyInstances and all the responses, the name could be build from this data. However all 3 approaches discussed here would use different ways for building this name.
For all tests, you'll need to set up a registration form with several name-flagged questions. Among these, I suggest to have one Cascade and one Option question.
Inspect Data tab changes
The goal is to ensure the datapoint name is rebuilt after changing name-flagged responses in the Inspect Data tab.
Re-publish registration form
The goal is to ensure the datapoint name is reassembled after republishing the registration form. This way we ensure the name-flagged responses are up to date
Raw data reports
The goal is to ensure any change to a raw data report in a registration form ends up reassembling the datapoint name
As per skype discussions earlier, I experienced an intermittent issue whereby I would update responses but the data point name would not be updated as expected. Changes by @ichinaski to add a small wait before re-assembling the data point name seem to have addressed the problem.
All other test plans pass as described :+1:
everything looks good I think :+1:
Overview
Since form data can be edited in multiple locations, the explicit datapoint name we keep in the datastore is prone to be out of sync, which causes continuous support requests around this.
The solution we have been discussing lately is to dynamically build the name based on flagged questions, ensuring we always have this property up to date.