datacamp / pythonwhat

Verify Python code submissions and auto-generate meaningful feedback messages.
http://pythonwhat.readthedocs.io/
GNU Affero General Public License v3.0
69 stars 31 forks source link

backend error when SCT imports pandas and uses converter #178

Closed machow closed 7 years ago

machow commented 7 years ago

Using the new package management system, when an SCT has a pandas import and uses a converter, then we run into a dilling error.

Example,

*** =pre_exercise_code

class C: pass
c = C()
c.a = 1

*** =sct

# COMMENT LINE BELOW TO REMOVE ERROR
import pandas

set_converter('__main__.C', lambda x: True)
test_object('c')

Raises the backend error:

DataCamp encountered the following error:
can't pickle PyCapsule objects

It looks like this issue may be related. Note that I can't replicate it on my laptop. It seems like it may be related to how the PMS sets up paths. I wonder if upgrading dill would resolve the issue.

filipsch commented 7 years ago

The shared container uses dill version 0.2.4 right now.

machow commented 7 years ago

Alright, I think I know the problem. dill pickles modules that start with the value of sys.prefix, or have "site-packages" in their path. However, neither of these conditions is met using the shared libs:

In [11]: pd.__file__
Out[11]: '/var/lib/python/shared_libs/pandas/__init__.py'

In [12]: sys.prefix
Out[12]: '/usr'

The solution should be to either rename "shared_libs" to "site-packages", or changing sys.prefix. Dill actually does something like...

names = ["base_prefix", "base_exec_prefix", "exec_prefix", "prefix", "real_prefix"]
any(pd.__file__.startswith(getattr(sys, name)) for name in names)

So setting any of those attributes should work. relevant dill code.

filipsch commented 7 years ago

@machow wow, great find.

I have updated the location of the shared libs on both the IMB and the Multiplexer. Because of this, we will have to rebuild some of the courses that already use the PMS, which is tricky. If this fix works, I'm going to disable active python images and rebuild them manually with the new SHARED_PYTHON_PATH variable. Shared container packages are now installed on:

/var/lib/python/site-packages

In addition, I've also updated the PYTHONPATH env variable when the course container is spinned up: course-specific packages now have priority over shared container packages. This did the trick:

'/usr/local/lib/python3.5/dist-packages:/var/lib/python/site-packages'

To test, you can use this link (contains a course image and shared image with the new env variables). https://campus-dev.datacamp.com/courses/intro-to-python-for-data-science/chapter-1-python-basics?ex=2&image=course-123:28636bde954860db528b9b99c1e2fd8a&shared_image=shared-python:0253ae439645183cc7d7479c0902c8aa

(the course specific image has dill version 0.2.3, while the shared image has 0.2.5. If you import dill, it's 0.2.3 that's available).

Let me know if this works, then I'll deploy. Rather urgent, as I don't want this to linger on too long. PMS for Python should be usable as soon as possible. There's also other updates to the multiplexer I want to deploy with this.

filipsch commented 7 years ago

FIXED! https://www.datacamp.com/teach/repositories/401/branches/pms