akvo / akvo-lumen

Make sense of your data
https://akvo.org/akvo-lumen
GNU Affero General Public License v3.0
63 stars 18 forks source link

Visualization of frequencies for multiple option questions #2620

Closed muloem closed 4 years ago

muloem commented 4 years ago

Context

We aim to make working across Flow and Lumen as smooth as possible what means Lumen should be able to handle Flow responses i.e. correctly display, transform and visualize them in a manner that is simple and as straightforward as possible for our users. In this issue, we look at a particular type of response; those from multiple option questions in Flow.

This is a building block for the implementation of multiple option question support in Lumen. For wider context see the original issue https://github.com/akvo/akvo-lumen/issues/2225

Problem or idea

Currently, in Lumen, multiple option questions are imported into a single column with the selected options separated by a pipe: Example orange|red|yellow. If Salim attempts to create a visualization for this column, say a bar chart, Lumen currently treats the entire column as a single response and so the number of possible combinations of responses results in categories that are not meaningful. See the example chart below.

Barchart with meaningless categories

Solution or next step

As a first step, we would like to introduce support for visualizing frequencies of selected responses to multiple options questions. To keep things simple, this visualization should be based on the single column corresponding to MOQ responses, without the need to perform any special splitting or transformation of data. The categories on the X-axis should correspond to the options available for the multiple option question. e.g. see the chart below based on the same data as the previous chart above.

Barchart with correct frequencies
Kiarii commented 4 years ago

This means that for the first iteration, we will support the analysis of MOQ data. On the grid, the current representation of MOQ data will be maintained: A row’s value contains only selected option names/labels.

On the analysis side, Lumen should able to perform metric calculation of respective options and the resulting bar chart (we will start with this for the first iteration) is auto-populated with the options and respective metric.

the Google Sheets example below shows what we are aiming for for the first iteration

Screenshot 2020-03-30 at 9 46 55

Other considerations;

In later iterations, based on user feedback and usage stats, we might consider

tangrammer commented 4 years ago

current research:

So, currently lumen receives an OPTION question that is stored as TEXT type in lumen. Then to offer this functionality (once the changes were merged) we need a new imported dataset and/or update a current one

janagombitova commented 4 years ago

Making bar charts work like a charm! Well done!

I did notice one thing we need to see how to handle. Currently, you can only use bar charts to create visualisations from the option data type - that is ok for data where there is more than one option in the cell. But now all option questions are marked as option datatype. So also a question like 'Gender' where the answer is either 'male' or 'female'. This column is now also set as option, but I cannot create a pie chart as the column type is not supported. Either we need to:
1) Enable support for option type for other visualisations where text is also allowed 2) Or we need to enable switching the column type from option to text and other way around.

I would vote for the 1st option.

tangrammer commented 4 years ago

yep! we need to update the visualizations conditions menu configs to adapt to new OPTION type

janagombitova commented 4 years ago

@tangrammer GREAT!

janagombitova commented 4 years ago

@tangrammer to add to the comment above, we also should allow transformations to work with the Option data type if they work with text data

janagombitova commented 4 years ago

Transforming OPTION columns works well.

I can also make visualisations with this column type, but if my OPTION is a multiple-answer-option-question I expect all the visualisations to handle the categories as the bar chart does. So I expect to have the categories grouped to show the frequency of each different value.

Screen Shot 2020-08-18 at 10 42 12

But for pie, polar, donut, bubble, map and scatter we currently these take each answer combination as a unique value, as before.

Screen Shot 2020-08-18 at 10 35 51 Screen Shot 2020-08-18 at 10 38 05

Here is an example of a map: https://lumen.akvotest.org/visualisation/5f3b92d5-a36f-4447-9048-1942cb794dce Here is an example of a bar chart, same dataset, same column as in the map but the values grouped the way expected: https://lumen.akvotest.org/visualisation/5f3b9433-551b-4010-b122-1386a5c1d91d

janagombitova commented 4 years ago

@tangrammer, @marvinkome and @kardan I found one more bug:

When defining any visualisation you can filter the data. Filtering on a TEXT or NUM column works perfectly. However, when I select an OPTION column to filter the data I get a blank page and the error below. I would expect that the OPTION data type works the same as TEXT in this case too.

Screen Shot 2020-08-18 at 14 50 51
janagombitova commented 4 years ago

I tested the OPTION functionality and the core of it works really well (besides maps as those are still WIP):

To conclude, besides the issue mentioned below and maps we are good to go 👍

~## issue found that needs attention~

~Stacking bars with Option data works well, but you cannot switch from split to stack/stack percentage Screen Shot 2020-08-19 at 11 21 08~

tangrammer commented 4 years ago

@janagombitova "Stacking bars with Option data works well, but you cannot switch from split to stack/stack percentage" seems like a different issue, doesn't it?

janagombitova commented 4 years ago

@tangrammer I wonder if it is related to these changes as if I have a bar chart that does not use an OPTION column, stacking and switching from stacked to percentage stack or split works fine. But we can handle it separately if that suits you better

janagombitova commented 4 years ago

Support documentation all set: https://lumensupport.akvo.org/article/show/114426-coming-up-next-how-can-i-analyse-data-from-multiple-answer-questions

Measuring success

Our goal is to make working across Flow and Lumen as smooth as possible. With this issue, we assume that the use of the derived js transformation will decrease as users will no longer need to handle MAOQs with js code but will use the OPTION data type. To measure the success of this change we can track the use of Derived column js overtime and assume to see a decrease in its use.

Baseline:

Screen Shot 2020-08-19 at 16 33 48

We assume the frequency of use of derive column will drop and decrease from the current 10% of unique transformation events to 5% by the end of 2020.

janagombitova commented 4 years ago

After giving all the issues one more go I decided that we handle the stacking problem as a separate issue, it does not block the release.

So this is good to go! YaaaY!

janagombitova commented 4 years ago

Testing on dark-demo I did not find any other issues. We are good to go and will tackle any bugs users find once reported.

janagombitova commented 4 years ago

Actually, the maps don't group the Options on dark-demo as they did before. Example here: https://dark-demo.akvolumen.org/visualisation/5f3e4384-9579-469b-ad5c-5bdd5d0cdf2b

Screen Shot 2020-08-20 at 11 35 01
janagombitova commented 4 years ago

I have been testing all different combinations and besides the few separate issues we handled today, all works fine. So I am going to close this issue.

On the other hand, we just spoke with Juan that the maps, if working with multiple answer option question work well but are misleading. We will create a separate issue to look into alternative solutions.