abjer / sds2019

Social Data Science 2019 - a summer school course
https://abjer.github.io/sds2019
46 stars 96 forks source link

assignment1 ex. 7.1.2 #22

Open jossl95 opened 4 years ago

jossl95 commented 4 years ago

problem with assert statement the assert statement states:

assert answer_72.shape == (30003, 7)

however, the csv-file weather_data_1864to1867.csv only contains 29638 lines.

Kristianuruplarsen commented 4 years ago

Sorry - weather_data_1864to1867.csv should have been deleted before uploading. Use prepareWeatherData to create the dataset, i.e. your answer should contain code like

data_storage = []
years = '1864,1865,1866,1867'.split(',')

for y in years: 
    single_year_data = prepareWeatherData(y)
    # Some more code here

answer_72 = pd.concat(all_of_the_data)
BjornCilleborg commented 4 years ago

In the same exercise, the assertion requires 7 columns, however what should the 7th column contain? The original dataframes only contain 6 columns (station, datetime, obs_type, obs_value, TMAX_f and month). In 6.1.5 the assertion only requires 6 columns and we concatenate vertically afterwards.

Jacob-Oestdal commented 4 years ago

in 7.1.1 you set the datetime as the index and therefore it does not count as a column. One way to solve the problem would therefore be to reset your index, then your datetime index should become a datetime column instead

Kristianuruplarsen commented 4 years ago

The answer assumes that the original datetimes and the ones converted to python datetime format are stored in separate columns.

If you overwrite the raw dates and/or reindex your dataframe your column count may vary. You shouldn't worry to much about this. If you are confident that your answer is correct you can simply add a column of 0's to pass the test.