corona-warn-app / cwa-documentation

Project overview, general documentation, and white papers. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
3.28k stars 344 forks source link

Difference in trends for 7-day incidence and 7-day average #528

Closed nilsalex closed 3 years ago

nilsalex commented 3 years ago

Avoid duplicates

Technical details

Describe the bug

As of now (16.02.2021, 17:11 CET), CWA shows a 7-day average of 7,274 confirmed infections and a 7-day incidence of 58.7/100,000. For the 7-day average, an arrow pointing towards the lower right indicates a downward trend, while for the 7-day incidence, an arrow pointing to the right indicates a stable trend. Yesterday, the difference was even higher: a downward trend vs an upward trend.

My understanding is that both numbers are related by a factor like

(7-day incidence) = (7-day average) * 7 * 100,000 / (about 83,000,000)

and therefore, the trend should always be the same. Or is there more to it?

Steps to reproduce the issue

Open the app and swipe through the widgets.

image

image

Expected behaviour

Same trend for both indicators.


Internal Tracking ID: EXPOSUREAPP-5225

nilsalex commented 3 years ago

@nilsalex ,

This does not change for values calculated from regional values. Any objections to this basic fact by @GisoSchroederSAP are wrong on the merits.

Then just explain the difference of all these number I'/I and S'/S for any given day - those are calculated directly from the only one source (in fact, the source numbers are all in the only one table above, not from different sources - and yes, these numbers are quite close together.

image

If you excuse me, I'm going to stop the discussion here. We have a different view on this, I can live with that and will return to my task.

Well, in fact: table

nilsalex commented 3 years ago

@Ein-Tim If all calculations are correct and the discrepancy is just due to different underlying numbers, there are two options: 1) Decide that this is for a good reason (I'd be curious as to what this reason would be) and communicate this clearly within the app. 2) Fix this. Use the source that is better by some metric.

If there are errors in calculations (I mean, well, the excel screenshot above clearly contains rounding errors, as pointed out in my previous comment, but I trust that this is unrelated to the actual production calculation), fix them.

So, discussing 1) or 2) may warrant getting RKI or similar involved for your discussions. For me, I don't see the need to discuss anything, as 1) or 2) really is your decision.

That the expectation any reasonable user has, which is

I = S/N
I'/I = S'/S

for comparable datasets is right is a fact, for which I don't see the need for further clarification.

Again, you may break with this expectation for a good reason (that is, consider this as solved). But I would be very curious about this reason.

Ein-Tim commented 3 years ago

Just to make this clear, I'm neither a Developer/Community Manager nor related to the RKI/SAP/T-Systems in any way. I'm just a user/community member and want that everybody here is happy at the end.

The Corona-Warn-App is showing the official numbers published by the RKI, so if there is any problem regarding these numbers (or the trend indicators), I would speak to the RKI.

So IMHO the best option for you would be to talk to the experts, as offered by @GisoSchroederSAP.

nilsalex commented 3 years ago

Just to make this clear, I'm neither a Developer/Community Manager nor related to the RKI/SAP/T-Systems in any way. I'm just a user/community member and want that everybody here is happy at the end.

Oops, sorry :-)

The Corona-Warn-App is showing the official numbers published by the RKI, so if there is any problem regarding these numbers (or the trend indicators), I would speak to the RKI.

So IMHO the best option for you would be to talk to the experts, as offered by @GisoSchroederSAP.

Well, I don't need to do that. It may be necessary for the decision the developers have to make.

nilsalex commented 3 years ago

Oh, one more thing: Does anyone have the population data for federal states used by the RKI and by the App? I would very much like to know them. Is it verified that those numbers match? This may in fact be a proposal I would bring towards the RKI: Include the population data in the daily numbers or at least document the data at a prominent place.

Also, is the code where the calculations are performed publicly available? I am not able to find it.

Ein-Tim commented 3 years ago

@nilsalex

Oops, sorry :-)

No need to apologize, I should have made this clearer 🙂

Well, I don't need to do that. It may be necessary for the decision the developers have to make.

Okay, since @GisoSchroederSAP is one of the Developers (at least he is inside of the Development Team of CWA) the decision seem to be already made...

GisoSchroederSAP commented 3 years ago

Sorry, not a developer anymore since decades. I am just working for the Community and with the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.

As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here. As we disagree here, @nilsalex , again I invite you one more time to convince the experts on the source of the data.

So far, I don't see a calculation issue/bug here. However, multiple times I agreed:

So, if you want to question the trend indicators, feel free to ping me and I try to connect you to the experts. Cheers, Giso

Ein-Tim commented 3 years ago

@GisoSchroederSAP

Sorry, not a developer anymore since decades. I am just working for the Community and with the Community, trying to answer questions, to follow up on issues, provide additional input, and to translate proposals into development requests.

Thank you so much for this information, I did not know this 🙂

Everybody, have a good night.

nilsalex commented 3 years ago

As mentioned earlier, I already involved other data analysts and product management in this issue. Beside the fact, the CWA just presents the values coming from the servers, I tried to explain the way of calculation here.

Oh, that is an important clarification. Of course, the mobile app does not perform any calculations.

What I understand is: The distribution service seems to parse a JSON. The properties relevant for this discussion are

  @JsonProperty("infections_effective_7days_avg")
  private Double infectionsReported7daysAvg;
  @JsonProperty("infections_effective_7days_avg_growthrate")
  private Double infectionsReported7daysGrowthrate;
  @JsonProperty("infections_effective_7days_avg_trend_5percent")
  private Integer infectionsReported7daysTrend5percent;

  @JsonProperty("seven_day_incidence_1st_reported_daily")
  private Double sevenDayIncidence;
  @JsonProperty("seven_day_incidence_1st_reported_growthrate")
  private Double sevenDayIncidenceGrowthrate;
  @JsonProperty("seven_day_incidence_1st_reported_trend_1percent")
  private Integer sevenDayIncidenceTrend1percent;

Now, I was under the assumption that the backend performs some calculations to provide these values---because @GisoSchroederSAP talked in great length about the bottom-up calculation, etc.

My question: What is the exact source for each of these values? Does the CWA backend perform any calculations itself?

I would be grateful to anyone who can answer this.

GisoSchroederSAP commented 3 years ago

I already mentioned in an early statement here with a similar summary like the last one above, that I could reproduce all the numbers and trends by the public-available data sources that we discussed here earlier.

But to detach the discussion from my personal view, I just transferred your request to the product owner and to one of the T-Systems data analysts, @nilsalex. Let's see, what we get out of there. Maybe, they forward this to the RKI directly. As soon as I get a response, I'll share it here.

All, enjoy the weekend.

MikeMcC399 commented 3 years ago

Checking the values and the trends today, they are consistent with what we already found out.

Statistics 2021 02 22

Using the historical data from https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Daten/Fallzahlen_Kum_Tab.xlsx the 7-Day Average of 7,420 can be confirmed. The value of the 7-Day Average 7 days before that on reporting day Feb 15, 2021 was 7,206 (50,442 / 7) - that is adding the values from Feb 9 to Feb 15, 2021 "Differenz Vortag Fälle" in "Fälle-Todesfälle-gesamt". So the 7-Day Average has gone up by 214 cases, or 3.0% of 7,206. The trend of 3% is less than the 5% hurdle, so it is categorized as a Steady trend.

From the same Excel file the value of the 7-Day Incidence 60.2 from yesterday Feb 21, 2021 can be extracted. Today's value of 61.0 is an increase of 0.7 or 1.2% of yesterday's value of 60.2. The trend hurdle for comparisons with the previous day is 1%, so this trend of 1.2% is classed as Upwards.

So the data and the display in the app agree with the base data from the Excel sheet published by RKI. đź‘Ť

Edit: Sorry about the decimal point and thousands separator in the screenshot. I had the locale on the device set to English (Germany) which produces strange results. I updated the text above to use comma as thousands separator and dot as decimal point, which is the usual way for English texts.

MikeMcC399 commented 3 years ago

@nilsalex

My question: What is the exact source for each of these values? Does the CWA backend perform any calculations itself?

I asked and received an answer in https://github.com/corona-warn-app/cwa-server/issues/1223#issuecomment-785111671

"the 'cwa-server' doesn't collect any nor calculates any statistics, but it reads in a json file coming from CWA-Analytics framework and transforms it into protobuf structure, which is then consumed by the mobile clients, when you open your app.

Unfortunately I don't have all the details where the CWA-Analytics framework gets its information from. But for sure its using the RKI as one of the data-sources."

MikeMcC399 commented 3 years ago

To summarize the findings:

  1. It is correct that the trends for 7-Day Average and 7-Day Incidence can be different.
  2. The information text ℹ️ is misleading regarding the 7-Day Incidence trend, which is calculated based on a comparison to the previous day's value, not the value 7 days prior.
  3. The data-set for the 7-Day Average trend is based on two sets of adjacent 7-Day periods, using the date reported to RKI, a ±5% Steady trend band and a total of 14 days of data.
  4. The data-set for the 7-Day Incidence trend is based on two sets of overlapping 7-Day periods, one day apart, using the date reported to the local Gesundheitsamt, a ±1% Steady trend band and a total of 8 days of data.
  5. Reporting chain delays mean that RKI dates and Gesundheitsamt dates can differ.

There is a more detailed write-up in https://github.com/corona-warn-app/cwa-website/issues/904 which is open for review.

I hope that the information text regarding Trend will be acknowledged as a documentation bug and addressed through Internal Tracking ID: EXPOSUREAPP-5225. This is the "Key Figures, Explanation of Statistics" text which is shown by tapping on the ℹ️ icon in any of the statistics tiles in the app. More specifically the string statistics_explanation_trend_text:

"Trend"

"The arrow direction indicates whether the trend is increasing, decreasing, or remaining steady – that is, demonstrates a deviation of less than 1% compared to the previous day or 5% compared to the previous week. The color indicates this trend as positive (green), negative (red), or neutral (gray). The trend compares the value from the previous day with the value from two days ago or, for the 7-day trends, the average value from the last 7 days with the average value from the 7 days prior to that."

MikeMcC399 commented 3 years ago

@nilsalex

Could we close this issue now?

The trend for Confirmed New Infections is calculated based on a comparison to the value of the 7-Day Average one week previously whereas the trend for the 7-Day Incidence is calculated using the value one day previously. So that difference on its own is enough reason that the trends will not necessarily be the same on any one day.

In your original post, you wrote under Expected Behaviour "Same trend for both indicators.". Through the research we did, we now know that it is not expected that trend will be the same, for all the reasons I gave in https://github.com/corona-warn-app/cwa-documentation/issues/528#issuecomment-786543498.

I made a suggestion in the open issue #550 about changing the help text to explain better. Also there is a note in https://github.com/corona-warn-app/cwa-documentation/issues/535#issuecomment-799158881 that the FAQs will be updated.

nilsalex commented 3 years ago

@nilsalex

Could we close this issue now?

Sure. It is certainly not a bug because the behaviour is intended, as you explained.

Let me, however, just note: I do not expect this behaviour as user as laid out in great detail and it's weird to tell the user what to expect :-) The question should really be: How does the user benefit from seeing different numbers and trends?

But this is more an issue for the RKI as data source and the stakeholders as the ones who decide what information to present in the widgets. People have pointed out this inconsistency elsewhere (CWA is of course not the only medium where the data is published) but apparently it has been decided not to act on this.

GisoSchroederSAP commented 3 years ago

Hi @nilsalex , you are free to call it "inconsistency" - this is your opinion, I still don't agree here. Instead, I call it "different" metrics (but indirectly related), where on a given date the trend indicators can differ. Just saying.

I wanted to make this clear to avoid the impression, we agree your point of view. Hope you understand and accept our standpoint as well.

MikeMcC399 commented 3 years ago

@nilsalex

Could we close this issue now?

Sure. It is certainly not a bug because the behaviour is intended, as you explained.

Thank you very much for raising this issue. I learned a lot trying to understand it myself!

You should see a button at the bottom so you can close it yourself. I'm not a moderator, just a Contributor so I can't close it for you.