cylondata / twister2

A composable framework for fast and scalable data analytics
https://twister2.org
Apache License 2.0
57 stars 32 forks source link

unified storage config #944

Closed ahmet-uyar closed 4 years ago

ahmet-uyar commented 4 years ago

I created two types of storage config parameters: volatile and persistent I put both parameters to common/data.yaml

Persistent storage can be: hdfs, mounted, none (mounted means local or nfs) Persistent storage parameters:

We create a directory under the root directory with jobID. All persistent files are saved in there. CheckpointManager and CheckpointClient always save to the persistent volume. They save to the following directories:

Volatile storage root is by default:

We create a directory with jobID under the volatile root. There can be multiple roots for volatile storage, since networking component can save to multiple disks simultaneously.

Networking data is saved to the following directory:

Tset data can be saved to either the persistent or the volatile directory. Following parameter determines that:

Tset data directory:

Logs can be saved to either persistent or volatile directory. However, logs can not be saved to hdfs. Following parameter determines the location of logs data:

Logging directory is:

In Kubernetes, we do not create the jobID directory under the volatile or persistent directory, since those directories are created for each jobs.