dataset format and coding improve

Hi @chrisyifanjin,

Can you please do something further for data process ?

1. can you please add more information and as the same format with:

https://github.com/pdtyreus/coronavirus-ds/blob/master/data/snapshot_jan25_12pm.csv (please use English column name only, which is easier for non-Chinese looking)

2. define a python function for data processing, for example

def data_cleaning(folder_path):
     """
    Combine data from multiple files within the given folder, and resture them
     """
     all_filenames=[i for i in glob.glob('*.{}'.format(extension))]
     combined_csv=pd.concat([pd.read_csv(f) for f in all_filenames],ignore_index=True)
     cleaned_data = 
     ......
     return cleaned_data 

folder_path = '../data/China'
cleaned_data  = data_cleaning(folder_path)

Another example

def preprocess_data(df: pandas.core.frame.DataFrame) -> pandas.core.frame.DataFrame:
    """
Apply data processing. 
        1)  Rename columns name
        2)  Columns type cast
    """    
   # 1)  Rename column
    df = df.withColumnRenamed("POS Margin on Net Sales", "Margin")

   # 2)  Conver the `df` columns to `FloatType()`
    columns = ['NetSales', 'QtySold', 'Margin', 'StockQty']
    df = convertColumn(df, columns, FloatType())
    # Convert Date column to timestamp 
    df = df.withColumn("Date", to_timestamp(df.Date, "yyyyMMdd"))

    return df

3. Please use `relative path` instead of `absolute path` for the file, then we can run your code without change the file path:

i.e. instead of

combined_data=combined_csv.to_csv('/Users/jinyifan/Desktop/Coronavirus-Epidemic-2019-nCov/Data_processing/Data_pro_China.csv',header=True, index=False)

## relative path
combined_data=combined_csv.to_csv('../Data/Conbined_data/China/Data_pro_China.csv',header=True, index=False)

(optional )You can combine Combined_data_China.ipynb and Combined_data_International.ipynb into one notebook only. because they use quite similar code, (can refer to the same function), just define different path!

YiranJing / Coronavirus-Epidemic-COVID-19

dataset format and coding improve #2

1. can you please add more information and as the same format with:

2. define a python function for data processing, for example

3. Please use `relative path` instead of `absolute path` for the file, then we can run your code without change the file path:

YiranJing / Coronavirus-Epidemic-COVID-19

dataset format and coding improve #2

1. can you please add more information and as the same format with:

2. define a python function for data processing, for example

3. Please use relative path instead of absolute path for the file, then we can run your code without change the file path:

3. Please use `relative path` instead of `absolute path` for the file, then we can run your code without change the file path: