datacamp / pythonwhat

Verify Python code submissions and auto-generate meaningful feedback messages.
http://pythonwhat.readthedocs.io/
GNU Affero General Public License v3.0
69 stars 31 forks source link

Correct submission throwing "Incorrect submission" -- block data June 23 #18

Closed hugobowne closed 8 years ago

hugobowne commented 8 years ago

in the following exercise, when the actual solution is submitted, it is NOT accepted AND an”Incorrect submission” is thrown (screenshot below):

https://campus.datacamp.com/courses/1167/2627?ex=13

The same occurs in here and a few more:

https://campus.datacamp.com/courses/1167/2627?ex=15

It's weird because there is not a problem in the following exercise and the SCT looks the same:

https://campus.datacamp.com/courses/1167/2627?ex=12

I think this is the SCT in question:

# Test: call to np.genfromtxt() and 'data' variable
test_correct(
    lambda: test_object("data"),
    lambda: test_function("numpy.genfromtxt")
)
screenshot 2016-06-08 12 34 29

All of this is currently on @franciscastro 's branch: https://github.com/datacamp/courses-importing-data-in-python/tree/review-francis

vvnkr commented 8 years ago

Problem is that test_object() uses np.all() in this case. If you test this code:

import numpy as np
file = 'titanic.csv'
data1 = np.genfromtxt( file , delimiter = ',', names = True , dtype = None )
data2 = np.genfromtxt( file , delimiter = ',', names = True , dtype = None )
np.all(data1==data2)

You'll notice that it results in False, even though the arrays seem exactly the same. This means test_object() will test false when it compares data from the solution environment with data in the student environment. The reason is because you have numpy.float64 characters in the column with index 4. This sometimes is nan, which is an object of the type numpy.float64. Apparently numpy doesn't count nan == nan as True.

@filipsch we'll need a workaround for this. Maybe use numpy.testing.assert_equal.

vvnkr commented 8 years ago

@filipsch Referencing this for pythonwhat implementation: http://stackoverflow.com/questions/10710328/comparing-numpy-arrays-containing-nan

hugobowne commented 8 years ago

holy wacky races! spooky NumPy. The issue also occurs w/ dataframes FYI:

https://campus.datacamp.com/courses/1167/2627?ex=15

hugobowne commented 8 years ago

it also occurs with pickle.load when there are no NaNs. I can raise another issue if necessary. see screenshot

screenshot 2016-06-09 07 28 27

https://campus.datacamp.com/courses/1167/2628?ex=3

hugobowne commented 8 years ago

same issue with pd.ExcelFile()

https://campus.datacamp.com/courses/1167/2628?ex=4

in this case NOT dataframe but type(xl) = pandas.io.excel.ExcelFile

screenshot 2016-06-09 07 30 20
hugobowne commented 8 years ago

Hi guys, this issue will block beta testing, which is slated for next Thursday 06/23. I'll need this to work before emailing potential beta testers, which I need to 4 days before beta testing starts (Sunday 06/19).

hugobowne commented 8 years ago

i checked out commit for this issue: will it work in the dataframe and excel cases?

see here:

https://campus.datacamp.com/courses/1167/2627?ex=15

and here:

https://campus.datacamp.com/courses/1167/2628?ex=4

vvnkr commented 8 years ago

Correct, this will do.

Note that this time I created specific equality tests for these kind of objects. If it's only used in just a few exercises and you can avoid it by using do_eval = False and test_function(), then do that. It's not manageable to create equality tests for all kinds of objects.

hugobowne commented 8 years ago

thanks!working in all cases except one:

https://campus.datacamp.com/courses/1167/2627?ex=13

screenshot 2016-06-18 14 40 20

thoughts?

vvnkr commented 8 years ago

Use a workaround here (with test_object(..., do_eval = False) and test_function(...)). Use this wiki section as a reference.

Btw, problem is that:

np.testing.assert_equal(np.genfromtxt(file , delimiter = ",", names = True , dtype = None), np.genfromtxt(file , delimiter = ",", names = True , dtype = None))

throws an AssertionError. Which means that even np's testing framework doesn't see those numpy arrays as equal.