Init-container should support downloading remote files from any sources compatible with Hadoop file system

Currently the init-container is able to download files from the resource staging server or any HTTP endpoints out-of-the-box. However, to be able to download files from a remote HDFS cluster, cloud storage, or S3, the init container very likely needs 1) Hadoop configuration (e.g., needed for both cloud storage and S3), 2) custom environment variable (e.g., GOOGLE_APPLICATION_CREDENTIALS for cloud storage and HADOOP_TOKEN_FILE_LOCATION for secured HDFS), and 3) credentials injected through user-specified secrets. Some of them may be done through custom docker images, but it would be a much better user experiences if they are natively supported.

@apache-spark-on-k8s/contributors

apache-spark-on-k8s / spark

Init-container should support downloading remote files from any sources compatible with Hadoop file system #562