Open saichaitanyamolabanti opened 2 years ago
@ijoseph @kevinwang @variablenix @prasad-kamat please help
Wow, we really should have pinned (and pip compile
ed, too) our requirements file below. Let me see if I can get something working and try to update the below.
https://github.com/Affirm/shparkley/blob/master/examples/requirements.txt
Alright, @saichaitanyamolabanti can you please try to pull this https://github.com/Affirm/shparkley/pull/7 PR, then pip install -r examples/macos-py3.10-requirements.txt
if you happen to have macOS and an empty python 3.10 environment, pip install -r examples/requirements.in
otherwise? That particular set of third-party requirements worked for me.
Hey @ijoseph, I've noticed to install few libraries as per your comments and began installing them, mainly the install and import of cloudpickle. Here are my observations, I can still find some errors, please help !!
Scenario-1 import cloudpickle
then row = dataset.filter(dataset.xxxx == '5').rdd.first() is working fine
Scenario-2: import cloudpickle import pyspark.serializers pyspark.serializers.cloudpickle = cloudpickle then row = dataset.filter(dataset.xxxx == '5').rdd.first() is throwing below error
then tried to pull those import of cloudpicklet and spark.serializers down below the investigation row like: row = dataset.filter(dataset.xxxx == '5').rdd.first() import cloudpickle import pyspark.serializers pyspark.serializers.cloudpickle = cloudpickle
but, still able to see error like - cloudpickle doesn't have the method 'print_exec'
@ijoseph Or you can consider this scenario: I've tried the same simpl.ipynb example by installing 'cloud pickle' library and by also importing pyspark.serializers and setting up with cloudpickle like below: import cloudpickle import pyspark.serializers pyspark.serializers.cloudpickle = cloudpickle
getting this error, please help !!:
@ijoseph @kevinwang @variablenix @prasad-kamat any help ?
Isn't it this issue? It looks like it's solved in pyspark 3.0.0 (PR). So maybe it would be enough to set the lower bound for pyspark dependency in setup.py
?
REQUIRED_PACKAGES = [
…,
'pyspark>=3.0.0',
]
I wanted to try out this package, because this implements pyspark version of shapley value generations. So, I just copy pasted "simple.ipynb" file into my environment to just observe everything basic is working alright or not, but able to see code is breaking at input cell [32]. Attached are the screenshots, could anyone please look into them?