FRosner / drunken-data-quality

Spark package for checking data quality
Apache License 2.0
222 stars 69 forks source link

Invalid syntax print baos.get_output() #154

Open KhushbuAgr opened 4 years ago

KhushbuAgr commented 4 years ago

Hi, I am getting an invalid syntax error.

$ pyspark --driver-class-path drunken-data-quality_2.11-x.y.z.jar
Python 3.6.11 (default, Jul 20 2020, 22:15:17)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/07/31 18:34:46 WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
20/07/31 18:34:46 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/07/31 18:34:47 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
      /_/

Using Python version 3.6.11 (default, Jul 20 2020 22:15:17)
SparkSession available as 'spark'.
>>> from pyddq.core import Check
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/dist-packages/pyddq/core.py", line 439
    print baos.get_output()
             ^
SyntaxError: invalid syntax
>>>
>>> df = spark.createDataFrame([(1, "a"), (1, None), (3, "c")])

>>> check = Check(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'Check' is not defined
>>> check.hasUniqueKey("_1", "_2").isNeverNull("_1").run()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'check' is not defined
FRosner commented 4 years ago

Hi! Sorry to hear that you are having issues. Unfortunately there is no Python 3 support as of today. Would you be interested in providing a PR?

FRosner commented 3 years ago

Actually... Are you using the latest version of DDQ / pyddq? This issue should've been fixed already.