awslabs / python-deequ

Python API for Deequ
Apache License 2.0
702 stars 132 forks source link

spark-submit job is not exiting until I hit ctrl+C if I use Check class #7

Open bruce32118 opened 3 years ago

bruce32118 commented 3 years ago

Is your feature request related to a problem? Please describe.

spark-submit job is not exiting until I hit ctrl+C if I use Check function

Describe the solution you'd like

Check function would start a self._spark_session.sparkContext._gateway server. If my code is completed, the server is still hang on that causes the spark job can't stop normally.

Adding a close function to class Check

def close(self): self._spark_session.sparkContext._gateway.close() return

MOHACGCG commented 3 years ago

I am experiencing the same issue. once the deequ run is complete and the results are gathered, pydeequ seems to keep something running in background. closing spark session doesn't help.

EDIT: spark.sparkContext._gateway.close() spark.stop()

does the job.

gucciwang commented 3 years ago

Thanks for the suggestion! We will add this feature into the next release :)

datadominik commented 2 years ago

@gucciwang @jaoanan1126 , any update here? Running hasSize checks from within AWS Glue will not work without adding spark.sparkContext._gateway.close() manually for me. Would appreciate a cleaner solution here. Not sure though if the spark.stop() is required.

AbdelrahmanAli commented 2 years ago

For whoever reading the issue here, I think it's worth mentioning this doc https://pydeequ.readthedocs.io/en/latest/README.html#wrapping-up

ketankvishwakarma commented 11 months ago
spark.sparkContext._gateway.close()
spark.stop()

Not working for me