NickSadjoli / Week4_GlobalForecast_COVID19

A collaboration repository containing the code for The Kaggle Competition of Week 2 Global forecasting of COVID-19 (https://www.kaggle.com/c/covid19-global-forecasting-week-2)
2 stars 0 forks source link

List of additional Data types and Sources #2

Open NickSadjoli opened 4 years ago

NickSadjoli commented 4 years ago

Need additional data sources compared to the one listed at Kaggle. Current other source would be the Worldometer site. However additional sources with other types of data would be most welcome.

NickSadjoli commented 4 years ago

So far, we've determined that Population, Population Density, and Age Median seems to have low correlation and thus not affecting the amount of Confirmed Cases and Fatalities much.

However, with the recent news and observations on how some countries managed to curb the rising amount of cases and fatalities, it seems that the following will be needed to be considered and might be the features that we'll need to look into:

  1. Number of tests conducted daily => Includes both true positive (Confirmed Cases) and the usually not reported true negative tests
  2. Increasing number of tests daily => Similar to the 'New Confirmed Cases' tab that we see in the Worldometer site, but this time something like 'New Tests Daily'

I'd say that this is quite an obvious data to consider, since more tests conducted would definitely mean more confirmed cases will be reported daily.

However, i'd also note that this means that if we're using such data, there are things to note, as can be seen below (Imo):

I'll take a read or watch some more videos on the COVID-19 spread to have a better idea of what kind of features for us to find after this, and hopefully I could find something good.

What's your opinions of this, @josephinemonica ? Any of your opinions or other suggestions would be very welcome and very appreciated.

EDIT: Formatting

NickSadjoli commented 4 years ago

To consolidate the types of data and sources we need to consider, based on discussions with @josephinemonica:

Default features to use and check:

Features to check correlation:

Other considerations based on TED talk with Bill Gates:

  1. Testing and Isolation measures (that are well implemented and at the right timing) can help to drastically reduce COVID-19 curve
  2. In particular, Isolation measures (such as shutdown) has seem to shown effective curve flattening, but different effectiveness from country to country
  3. Temperature (seasonality) might be a possible factor, but no conclusive studies or evidence has backed this up yet
  4. With note to number 1 though, GDP of a country would effect its capabilities to: a. Supply and perform more Testing b. Capability to Isolate (lockdown/quarantine) due to Economic hit.
  5. Things that affect testing (overall): a. Accuracy of testing (true false, true positives, false positives, false negatives) b. Speed of test results (can tests return results before 24 hours - like South Korea for example? Or would tests normally require hundreds of hours?)