datasets / covid-19

Novel Coronavirus 2019 time series data on cases
https://datahub.io/core/covid-19
1.16k stars 604 forks source link

Columns instead of Rows for reporting historical data by County, State #65

Closed himakarganti closed 4 years ago

himakarganti commented 4 years ago

Can you modify the script to report the cases in columns instead of rows for historical data? It makes it easier to work with for data extraction. Transposing them in Excel or other software is extremely time-consuming for data analytics and this will save a considerable amount of time when working with the dataset with other tools.

anuveyatsu commented 4 years ago

@himakarganti can you please give an example of current data and what is expected (desired)? If this is about pivoting the data, we're trying to provide normalized data where possible.

himakarganti commented 4 years ago

The current data is reported with the time-series for a place (count, state or country)along a row in the csv file. My request is if they can be reported in columns instead as shown in the Excel file below. timedata.xlsx

tungttnguyen commented 4 years ago

IMO the data format is good because it's in tidy format. Data aggregation and visualization are much easier with this format. You can always convert from long to wide format using built-in functions (pivot_wider() in R for example).

himakarganti commented 4 years ago

It doesn't help as I need the data in an array for further analysis in Python and I will not be using R or Matlab for it.

GerryG07 commented 4 years ago

I like the current format. Works fine for my data processing tool.

anuveyatsu commented 4 years ago

@himakarganti the main purpose of this project is to provide normalized data with standard metadata (datapackage.json). All custom data transformations should be done by users.