davidbau / covid-19-chart

Chart of current COVID-19 time series data. Enables a variety of county- state- and nation-level comparisons and data exploration.
https://covid19chart.org/
18 stars 4 forks source link

Idea: plot covid research papers published by keyword. #53

Open davidbau opened 4 years ago

davidbau commented 4 years ago

The virus is a biological meme, 30K of RNA base-pair sequences copied from host to host, randomly mutating in a gradual process to improve its fitness. The population-wide human response to a virus is also done by spreading memes. But here each meme is a piece of information about how the virus works and how to stop it, transmitted from person to person, processed and synthesized intentionally. The dynamics of the response seem pretty different.

So far we have only plotted time series for the virus "bad guys".

It would be interesting to see time series for the "good guys" - the researchers ideas circulating about understanding the disease and possible treatments.

Here is a dataset that contains the text of 63627 covid research papers so far. https://www.semanticscholar.org/cord19/download

I have not yet seen time series visualizations of this data. We could simply plot number of cumulative papers by keyword every day. Or we could plot daily appearances of words within paper text or citations.

Questions that should be able to be answered: How many papers a day are mentioning Remdesvir, HCQ, etc? What are this week's biggest percentage gainers?

davidbau commented 4 years ago

A quick look at the metadata reduces the number of current papers in the CORD-19 corpus quite a bit. Of the 63000+ papers, only 13630 were published in 2020. Data pushed in 57cc059e3b9e4586a681aab943d1ea465a187388.

The analysis here does a semi-automated literature review, summarizing results from about 10% of the CORD-19 papers published since February.

https://www.kaggle.com/covid-19-contributions