AlexIoannides / pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.
1.56k stars 672 forks source link

Setup and Teardown should be @classmethods setUpClass and tearDownClass #20

Open amrishan opened 3 years ago

amrishan commented 3 years ago

if they are not class methods then the method would be invoked for every test and a session would be created for each of those tests.

`class PySparkTest(unittest.TestCase): @classmethod def suppress_py4j_logging(cls): logger = logging.getLogger('py4j') logger.setLevel(logging.WARN)

@classmethod
def create_testing_pyspark_session(cls):
    return SparkSession \
                .builder \
                .master('local[*]') \
                .appName("my-local-testing-pyspark-context") \
                .getOrCreate()

@classmethod
def setUpClass(cls):
    cls.suppress_py4j_logging()
    cls.spark = cls.create_testing_pyspark_session()
    cls.test_data_path = "<PATH>"
    cls.df = cls.spark.read.options(header='true', inferSchema='true') \
                .csv(cls.test_data_path)
    cls.df_exepcted = transform_data(cls.df, cls.spark)

@classmethod
def tearDownClass(cls):
    cls.spark.stop()`
AlexIoannides commented 3 years ago

Well spotted - thank you 👍