awslabs / mlmax

Example templates for the delivery of custom ML solutions to production so you can get started quickly without having to make too many design choices.
https://mlmax.readthedocs.io/en/latest/
Apache License 2.0
66 stars 19 forks source link

Some datatest suggestions. #11

Closed shawnbrown closed 3 years ago

shawnbrown commented 3 years ago

Instead of decorating each function with @dt.working_directory(...), you could use a single pytest fixture with autouse=True and then set the scope to session, module, or function as appropriate:

@pytest.fixture(scope='session', autouse=True)
def set_working_directory():
    with dt.working_directory(__file__):
        yield  # Use directory for specified scope.

Also, if you need to validate and pd.DataFrame or pd.Series objects, there is now tighter Pandas integration via the dt.register_accessors() function: https://datatest.readthedocs.io/en/stable/reference/data-handling.html#pandas-accessors

josiahdavis commented 3 years ago

Thanks so much for the tip @shawnbrown - really appreciate your comment!

josiahdavis commented 3 years ago

We are going to stick with the pytest functionality for now, thanks for the idea Shawn.

shawnbrown commented 3 years ago

OK. But to be clear, as far as the directory goes, I wasn't advocating any non-pytest functionality. As an example, the file tests/mlmax/test_preprocessing.py contains four test functions and they all use the @dt.working_directory decorator:

@dt.working_directory(__file__)
def test_read_data(input_data_path):
    ...

@dt.working_directory(__file__)
def test_train_preprocessing(input_data_path, args_train):
    ...

@dt.working_directory(__file__)
def test_infer_preprocessing(input_data_path, args_infer):
    ...

@dt.working_directory(__file__)
def test_parse_arg():
    ...

You can omit all of those decorators and replace them with a single fixture (assuming you always want to work relative to the __file__ path):

@pytest.fixture(scope='session', autouse=True)
def set_working_directory():
    with dt.working_directory(__file__):
        yield  # Use directory for specified scope.

Doing this prevents switching into and out of directories before and after each function call. But having said that, by repeating the decorator for each function, it is more explicit in its current form.

Take care!

josiahdavis commented 3 years ago

Thanks for the clarification @shawnbrown. Appreciate it!