CSSEGISandData / COVID-19

Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
https://systems.jhu.edu/research/public-health/ncov/
29.12k stars 18.41k forks source link

Has anyone created a script to pull US County level data directly from the Johns Hopkins Dashboard? #1788

Open robertcox87 opened 4 years ago

filotti commented 4 years ago

You don't need to pull the data from the dashboard, everything's available in this repo.

I made a node.js script to parse the daily csv files from csse_covid_19_data/csse_covid_19_daily_reports/, starting from 2020-03-22 there is county level data.

Here's the script so you can tweak it to fit your needs:

const parse = require('csv-parse/lib/sync');
const fs = require('fs');
const path = require('path');
const source = 'csse_covid_19_daily_reports/';

const data = {};

const parseRow = (row, data) => {
    const FIPS = row.FIPS.padStart(5, '0');
    if (typeof data[FIPS] === 'undefined') {
        data[FIPS] = {
            lat: row.Lat,
            long: row.Long_,
            dates: {}
        }
    }
    const date = new Date(row.Last_Update);
    const dateString = date.getFullYear() + '-' + ('0' + (date.getMonth()+1)).slice(-2) + '-' + ('0' + date.getDate()).slice(-2);
    if (typeof data[FIPS]['dates'][dateString] === 'undefined') {
        data[FIPS]['dates'][dateString] = {
            confirmed: row.Confirmed
        }
    }
};

const parseCSV = (filename, data) => {
    console.log("Parsing ", filename);

    const rows = parse(fs.readFileSync( source + filename ), { columns: true , bom: true});

    for (let i = 0; i < rows.length; i++) {
        if (rows[i].Country_Region === 'US') {
            parseRow(rows[i], data);
        }
    }

};

const items = fs.readdirSync(source);

for (let i = 0; i < items.length; i++) {
    if (path.extname(items[i]) === '.csv') {
        parseCSV(items[i], data);
    }
}

fs.writeFileSync('data.json', JSON.stringify(data));

console.log('Done!');
robertcox87 commented 4 years ago

Thanks! I was looking for the dashboard because it seemed to be more current than the static .csv. Any idea if there is a love county feed available anywhere? Thanks again for the help!

Sent from Outlook Mobilehttps://aka.ms/blhgte


From: filotti notifications@github.com Sent: Monday, March 30, 2020 7:47:15 PM To: CSSEGISandData/COVID-19 COVID-19@noreply.github.com Cc: robertcox87 robertbcox87@gmail.com; Author author@noreply.github.com Subject: Re: [CSSEGISandData/COVID-19] Has anyone created a script to pull US County level data directly from the Johns Hopkins Dashboard? (#1788)

You don't need to pull the data from the dashboard, everything's available in this repo.

I made a node.js script to parse the daily csv files from csse_covid_19_data/csse_covid_19_daily_reports/, starting from 2020-03-22 there is county level data.

Here's the script so you can tweak it to fit your needs:

`const parse = require('csv-parse/lib/sync'); const fs = require('fs'); const path = require('path'); const source = 'csse_covid_19_daily_reports/';

const data = {};

const parseRow = (row, data) => { const FIPS = row.FIPS.padStart(5, '0'); if (typeof data[FIPS] === 'undefined') { data[FIPS] = { lat: row.Lat, long: row.Long_, dates: {} } } const date = new Date(row.Last_Update); const dateString = date.getFullYear() + '-' + ('0' + (date.getMonth()+1)).slice(-2) + '-' + ('0' + date.getDate()).slice(-2); if (typeof data[FIPS]['dates'][dateString] === 'undefined') { data[FIPS]['dates'][dateString] = { confirmed: row.Confirmed } } };

const parseCSV = (filename, data) => { console.log("Parsing ", filename);

const rows = parse(fs.readFileSync( source + filename ), { columns: true , bom: true});

for (let i = 0; i < rows.length; i++) { if (rows[i].Country_Region === 'US') { parseRow(rows[i], data); } }

};

const items = fs.readdirSync(source);

for (let i = 0; i < items.length; i++) { if (path.extname(items[i]) === '.csv') { parseCSV(items[i], data); } }

fs.writeFileSync('data.json', JSON.stringify(data));

console.log('Done!');`

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/CSSEGISandData/COVID-19/issues/1788#issuecomment-606329563, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AO5K6ZXADEDGRYLO4SFVPPDRKE4RHANCNFSM4LXBPUDQ.

filotti commented 4 years ago

You're right, I wish they would update this project more often. I don't think you can access the data directly, at least not without a significant amount of effort.

J-Rojas commented 4 years ago

They have automated scripts uploading the data continuously. If it's out of sync, it probably is only out of sync for a short period of time (maybe a few hours)

J-Rojas commented 4 years ago

The county data is in the web-data branch BTW. Just look at the cases.csv there. This is what the dashboard uses as its data source.

J-Rojas commented 4 years ago

They just published a US county level time series report in the master branch as well.

gohkokhan commented 4 years ago

Yes, I've created a script to pull the data directly from the dashboard in real-time here https://github.com/gohkokhan/covid19_JHU_dashboard I've been using it for a month for my interactive dashboard on Data Studio https://datastudio.google.com/reporting/f56febd8-5c42-4191-bcea-87a3396f4508

However, currently, I cannot get the data for county level, because I need JHU change the configuration on their feature server. https://services1.arcgis.com/0MSEUqKaxRlEPj5g/arcgis/rest/services/ncov_cases/FeatureServer/1 It's mentioned here: https://github.com/CSSEGISandData/COVID-19/issues/1250#issuecomment-606351561

jjbenes commented 4 years ago

county-level and state-level maps here based on nyt data... should probably switch to the new file from jhu

robertcox87 commented 4 years ago

Thanks for the update!

jjbenes commented 4 years ago

Okay. I just switched from the NYT database to the JHU US-county CSV files. The JHU team did a nice job with all these files. It wasn't too painful to switch the front-end. I used to combine population data from the U.S. Census Bureau and the NYT CSV files to get the choropleths. Now I'm using only CSV files from JHU.

The map with four choropleths, two for the state level and two for the county level, is at the same place.

Screen Shot 2020-04-02 at 1 52 35 AM ).

J-Rojas commented 4 years ago

Nice map! BTW where did you get the county boundary data from?

On Thu, Apr 2, 2020 at 2:00 AM jjbenes notifications@github.com wrote:

Okay. I just switched from the NYT database to the JHU US-county CSV files. The JHU team did a nice job with all these files. It wasn't too painful to switch the front-end. I used to combine population data from the U.S. Census Bureau and the NYT CSV files to get the choropleths. Now I'm using only CSV files from JHU.

The map with four choropleths, two for the state level and two for the county level, is at the same place https://first-principles.ai/covid-19/map.html.

[image: Screen Shot 2020-04-02 at 1 52 35 AM] https://user-images.githubusercontent.com/62985029/78229608-18934b00-7485-11ea-8a2a-615d326f1075.png ).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CSSEGISandData/COVID-19/issues/1788#issuecomment-607716296, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANFYNCAHEYTPU2HRWSGIV3RKRHZLANCNFSM4LXBPUDQ .

jjbenes commented 4 years ago

@J-Rojas I got it from folium. There is at least one bug though. I happened to zoom in on San Francisco and noticed that S.F. County was missing. Turned out that the polygon was drawn as a line by mistake.

jjbenes commented 4 years ago

Uploaded the front-end to github. The code converts JHU's CSV files to Pandas dataframes. The geo json files are there, too. They provide the state and county boundaries. I haven't uploaded the map engine yet but it can now create per-capita cases for all U.S. counties, ranked daily. You can use the slider to how the disease has been spreading since Jan 21.

aveekroy commented 4 years ago

https://katacoda.com/aveekroy/scenarios/covid19stats It has a script that is taking the County, State, Country and displaying the results.