Transform and training-test split now takes places as a new type of job: transform_and_split in the method _transform_and_split_data.
Corresponding enums have been created
Tests have been updated to test transform/split functionality, as well as test training/testing off of transformed and split data.
DMLJob
New attributes added for raw_filepath (previously rfp), session_filepath, and transform_function.
serialize_job and deserialize_job have been updated accordingly.
Testing utils
Created a new utils for transforming and splitting data.
For tests that facilitate training + validation outside of Runner tests, I just had added the session filepath manually to the initialize job. Although transform and splitting for all of these tests is not necessary, we need to implement transforming and splitting data in the optimizer logic. This should be a separate PR, maybe assigned to @kiddyboots216?
Dataset Manager
Refactored code, tests and documentation so that it is clear the Dataset Manager does not handle transforms any more.
Major changes:
Runner
transform_and_split
in the method_transform_and_split_data
.DMLJob
raw_filepath
(previouslyrfp
),session_filepath
, andtransform_function
.serialize_job
anddeserialize_job
have been updated accordingly.Testing utils
Dataset Manager