abjer / isds2020

Introduction to Social Data Science 2020 - a summer school course abjer.github.io/isds2020
58 stars 92 forks source link

Issue regarding Assignment1 - Ex 5.1.4 #32

Closed Stinth closed 3 years ago

Stinth commented 3 years ago

If the column width is specified to the given width the assert statement does not pass. The main issue appears to be with the "elevation" column. Setting it's width to (31,37) instead of the given (32,37), fixes the issue.

https://github.com/abjer/isds2020/blob/master/assignments/assignment1/assignment_1.ipynb

jsr-p commented 3 years ago

hi @Stinth ,

which assert statement? Note that the assert statements are only for the code cells directly above. So you should only test the assert statement with the code from above. If you change the dataframe in one of the later exercises and try to run the assert statement with that dataframe they will fail, but as said, they are not intended for the code in later exercises :) The assert statements all work in the main notebook so if it fails in yours you should check your code :)

// Jonas

Stinth commented 3 years ago

@jsr-p I'll showcase what the problem is. When using the given values for colw you get: image

When changing colw of "elevation" to (31,37) you get: (and pass assert statements) image

It is the assert statement assert round(final_data.elevation.mean()) == 248 that fails

jsr-p commented 3 years ago

hi again @Stinth , below I have taken a screenshot of the mean elevation of the weather stations merged onto our weather data for your solution and the one in the assignment. image The means are not the same so I proceed to investigate where they differ. Here I subset the solution dataframe by finding all entries where the elevation columns in the two dataframes differ. image Here I do the same but subset the dataframe of yours: image Among others, the elevation for the weather station in Nuwara Eliya differs between the two dataframes. Inspecting the txt file with the station data we see: image

It looks like your method parses the weather station data incorrectly. In particular, it leaves out the first digit for all weather stations with an elevation with more than 3 digits. image

From the documentation of read_fwf for the parameter colspecs:

colspecslist of tuple (int, int) or ‘infer’. optional A list of tuples giving the extents of the fixed-width fields of each line as half-open intervals (i.e., [from, to[ ). String value ‘infer’ can be used to instruct the parser to try detecting the column specifications from the first 100 rows of the data which are not being skipped via skiprows (default=’infer’).

I would not specify the colspecs parameter as the function does the job for us :)

happy coding, Jonas