Closed eliasmalik closed 6 years ago
It's been clarified that this data will be per region (of which there are 9), so actually the schema will look more like:
CREATE TABLE cls_benchmark_data (
cls_benchmark_data_id SERIAL NOT NULL UNIQUE,
frontline_survey_question_id INT NOT NULL,
community_business_region_id INT NOT NULL,
mean_score_question INT,
mean_score_answer_1 INT,
mean_score_answer_2 INT,
mean_score_answer_3 INT,
mean_score_answer_4 INT,
mean_score_answer_5 INT,
mean_score_answer_6 INT,
CONSTRAINT cls_benchmark_data_pk PRIMARY KEY (cls_benchmark_data_id),
CONSTRAINT cls_benchmark_data_to_frontline_survery_question_fk FOREIGN KEY (frontline_survey_question_id) REFERENCES frontline_survey_question ON DELETE CASCADE,
CONSTRAINT cls_benchmark_data_to_community_business_region_fk FOREIGN KEY (community_business_region_id) REFERENCES community_business_region ON DELETE CASCADE,
CONSTRAINT cls_benchmark_data_unique_per_region_per_question UNIQUE (frontline_survey_question_id, community_business_region_id)
)
@eliascodes looking at the sample data in emails, isn't mean_score_answer_1
actually a ratio of the number of these answers compared to the total?
isn't mean_score_answer_1 actually a ratio of the number of these answers compared to the total?
mean_score_question
looks like it's the expected value (weighted average).
It's not obvious to me what mean_score_answer_n
is though. If it was just a ratio it would need to be < 1, if it was a weighted average, then I don't understand how that average is calculated.
If it is supposed to be the expected number of people who choose that answer, then it needs to be a function of how many people actually answered the question. It's not clear from the example in the email IMO.
Maybe I've missed something obvious though...
@SonjaTwine can you clarify some information regarding the statistics to be used for CLS benchmark data? From my understanding of your email:
If these assumptions are correct this data can be represented as:
@astroash I'm not entirely sure that I understand your three points correctly in turn. But, here are the actual benchmarks.
I do like your graphs, we should definitely get the one on the right. - For the one on the left to work though, we need two different scales on the y-axis, since it's frequency (up too 100s) and an average (somewhere between 1 and 6). That might limit how useful it can be?
I would suggest that for now, we just save the averages. (The medians aren't very useful because in all likelihood a) CB users don't know what that is and b) the medians capture so little variation that a lot of our CB surveys will simply have the same result). Then we can calculate the averages for the surveys run by Twine users and just send those two numbers to Frontline. Your concept of average is probably the same, but just to clarify that's calculated like this: 100 people respond 1, 100 respond 2, 100 respond 3, 100 respond 4, 100 respond 5 -> 1001+1002+1003+1004+100*5 / 500= 3 as average score.
Does that help at all or have I just demonstrated how little I understand basic maths?
@SonjaTwine thanks for the benchmark spreadsheet, it helps a lot! I noticed the stats for the NPS survey are different to the rest. Is this the format this NPS data is useful in?
Yes, that's true - we could drop the three more specific numbers and just keep the average, if that makes your life a lot easier? That would be enough, really.
It won't make much of a difference for us. I just want to check that it will be useful to display this data to users in a different format to the rest.
Yes, it should :) thanks for checking!
For the record: since the data for NPS is fundamentally different from the other CLS surveys, we decided to split the model into two tables.
Additionally, since the data is updated annually, the relevant year is stored explicitly with the data, as opposed to relying on the timestamp columns, which in this case are mostly for auditing.
69
Updates to the data model to store benchmark data