Currently, most of our tests that require access to files (datasets, mapping tables, etc), rely on paths relative to the current folder. Some of data files are generated on fly. That effectively means that we put temporary files inside project folders.
Creating a temporary directory and generating input test files and mapping tables there is not a big deal. getLocalTemporaryDirectory() method can be used for this. But our unit tests should run in distributed mode as well (See #317).
Feature
Need to develop the following methods:
[ ] Create a temporary folder and return its path. The behavior of the method should depend on Hadoop configuration provided.
When a Spark is running in local mode the method should create a local temp directory in a portable way.
When a Spark is running in distributed mode the method should create a temporary directory in HDFS.
[ ] Delete a temporary directory recursively. The behavior should be similar to the creation method and it should depend on the provided Hadoop configuration.
Additional context
This task requires investigation on how temporary data is managed in HDFS. Need to check if there are universal methods for creating temporary folders in HDFS.
Need to keep in mind future extension to S3 support.
Background
Currently, most of our tests that require access to files (datasets, mapping tables, etc), rely on paths relative to the current folder. Some of data files are generated on fly. That effectively means that we put temporary files inside project folders.
Creating a temporary directory and generating input test files and mapping tables there is not a big deal.
getLocalTemporaryDirectory()
method can be used for this. But our unit tests should run in distributed mode as well (See #317).Feature
Need to develop the following methods:
[ ] Create a temporary folder and return its path. The behavior of the method should depend on Hadoop configuration provided.
[ ] Delete a temporary directory recursively. The behavior should be similar to the creation method and it should depend on the provided Hadoop configuration.
Additional context