Data validity: counting and expiration

otmarlendl commented 1 year ago

In a test survey, I have answers from 9 access tokens. A graph like this:

Screenshot 2023-02-27 133642

doesn't make sense, as the ring diagram needs to display the state of the world at the end of the time-slider.

Before starting to display data, a pass must be made over all survey results in order to get a timeline of the state of any answer.

What's needed is the following:

We need to establish a timeline on how many responders are there over time. Every new access code we see over time increases that number. There should be a config option how to handle responders who haven't answered in X hours/days. Basically: if we don't get feedback from an entity for a long time, they should move back to "unknown" status.
For each question and for each responder, there will be a timeline of what the state of this question is for that responder. It can be "unknown" (not answered, question was not asked, answer expired) or the value given.
Timevalues of interest: The internal results table that drives the graphs must have entries for all timestamps that are
- start + end of timeslider
- any time a response was submitted
- any time a response expired before a new response was submitted

In SQL terms (illustrative only, this should all be in memory client-side), this could result in:

-- for which time-values do we need to know the state of the world? table timestamps ( ts timstamp, num_valid_responses int, )

-- a row for every single ts / responder combination table answers_question_N ( when ts, responder string, answer [depends on Q type] )

table state_question_N( when ts, [data aggregation from answers_question_N depending on Q type] )

The last row in state_question_N that is still inside the time-slider is used to generate the graph on left for this question.

The full table state_question_N is what directly drives the graph on the right.

More on data aggregation in another Issue.

otmarlendl commented 1 year ago

This of course only works if users are identified (closed survey). Thus it's necessary to set the right survey options in LimeSurvey.

b3n4kh commented 1 year ago

To solve this the data for the Linechart has to be changed to the following:

For each response for each question, there has to have be "artificial" points for every other possible user.

Example:

The Survey has 10 Questions and over the span from one month, 10 different users answered these questions 10 times.

Create 100 responses out of the RPC endpoints filtered by time slider.
Enrich with up to 100 additional data points for every "expiration point" of every response, if there isn't an answer from the same user already inside that range. These enriched points always have the value: "N/A"
Enrich again now for each of these (now 200) data-points the current state for every other user has to be added, so in this case 2000 data points. The value is based on the "last" value of that user, no matter if it is an "enriched" or a normal response.

A test case validating at least these "numbers" should be written, since this feature will be very complex and prone to errors. Keeping in mind, the final 10-fold increase is static, whereas the expiration point increase is dynamic.

ait-cs-IaaS / koord2ool

Data validity: counting and expiration #23