databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

Cache directory with pre processed models is hard-coded #207

Open cabral1888 opened 4 years ago

cabral1888 commented 4 years ago

Hello. I am using the sparkdl in a Spark cluster with YARN integrated with Docker. I am having problems related to user home directory when the codes fetch the preprocessed models (like InceptronV3, XCeptron, etc) and stores it into my HOME_DIR. For advanced reasons, YARN doesn't create the user HOME_DIR, and when the library tries to write into this directory, it fails. What I need to do is to change the default behavior to store models in any directory as I want.

Would it be possible to change code behavior to define the cache directory at execution time? For instance, when I instantiate the following class:

featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3", cacheDir=<SOME_PATH>)

Obs.: The file with the HOME_DIR hard-coded is: src/main/scala/com/databricks/sparkdl/ModelFetcher.scala on line 40

Best regards!

cabral1888 commented 4 years ago

Someone?