M3IT / COVID-19_Data

COVID-19 Data for Australia
Other
40 stars 80 forks source link

negative numbers #16

Closed suzmee23 closed 1 year ago

suzmee23 commented 1 year ago

Hi, We are students and would like to use your data for my Uni assignment. As we were going through the data, we found some numbers with 'negative - ' sign. May we know what does that mean?

M3IT commented 1 year ago

It usually means a correction to previous numbers, generally that the issuing authority has provided erroneous numbers and the negative value corrects the error. You'll also see some very large positive values - these are often also corrections. If there is a specific date you're interested in, let me know and I'll try to find out.

suzmee23 commented 1 year ago

Thank you for letting us know the meaning of those strange numbers, it helped us understand the data better. So, what would you suggest we do with those strange numbers? Should we exclude those numbers from our analysis or is there a way to find out what the actual numbers were. Eg. row 9349 has large negative number so, what should i do to get the right value of it or should I simply not include that entire row in analysis?

Hoping to hear from you and Thank you

M3IT commented 1 year ago

Which file are your you referring to and what date?

suzmee23 commented 1 year ago

File name is COVID_AU_state , date 07/04/2023 for Queensland. Thank you

M3IT commented 1 year ago

Welcome to world of data science where data is often dirty!

These types of questions are difficult to answer and it's up to the individual analyst to define assumptions and fixes. You'll note in this case that there was a jump similar in magnitude 1 year earlier, so this large drop may have been correcting that data. However, if you plot them, you'll see that they don't match exactly, so that's not likely the cause (or at least the only cause).

image

suzmee23 commented 1 year ago

Thank you so much for some very meaningful discussion on this matter.