awslabs / python-deequ

Python API for Deequ
Apache License 2.0
691 stars 132 forks source link

TypeError: an integer is required (got type bytes) when running basic example #21

Closed giznaj closed 3 years ago

giznaj commented 3 years ago

Describe the bug I am running the basic_example.ipynb notebook locally from my laptop and get the following error (TypeError: an integer is required (got type bytes).

To Reproduce

  1. Launch the jupyter notebook locally on my laptop with specs listed
  2. Run the first block -> pip install pyspark (successful)
  3. Run the second block -> pip install pydeequ (successful)
  4. Run the third block -> pip install sagemaker_pyspark (successful with warnings)
  5. Run the fourth block -> fails with error listed in issue

Code import pydeequ import sagemaker_pyspark from pyspark.sql import SparkSession, Row

classpath = ":".join(sagemaker_pyspark.classpath_jars()) # aws-specific jars

spark = (SparkSession .builder .config("spark.driver.extraClassPath", classpath) .config("spark.jars.packages", pydeequ.deequ_maven_coord) .config("spark.jars.excludes", pydeequ.f2j_maven_coord) .getOrCreate())

Expected behavior Spark session and a small sample dataframe are created without errors

Screenshots Capture

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context I have tried/tested other versions of these (PySpark, Python and Pydeequ) to test and troubleshoot this. I'm unsuccessful in doing this in Databricks also with this error -> TypeError: 'JavaPackage' object is not callable

Capture2

giznaj commented 3 years ago

I just noticed that the python version listed in the issue is not the same as the one in the screenshot. Let me just confirm if the version that is installed on the local machine is version that is used by pip install during execution of the notebook. If that commaned is not executed, the module not found error is raised.

gucciwang commented 3 years ago

Hi @giznaj ! Are you still running into this issue? Feel free to reopen if you need more help!