USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Is the default config overrideable without updating the jar? #53

Closed buggtb closed 7 years ago

buggtb commented 7 years ago

I don't know if I'm just missing something or its not done yet, can I define an external site-default.yaml type config file?

thammegowda commented 7 years ago

Yes! You can prefix your file to the classpath to override it without updating the jar.

I use bin/sparkler.sh script; it prefixes the resources in ./resources/ directory to classpath. java -cp $DIR/resources:$JAR edu.usc.irds.sparkler.Main $@

Currently, we place such files in conf directory, all changes made here are tracked in git. Maven will make a copy of it to resources directory. Here you can modify it at runtime. if you dont need these changes to be pushed to repo you can safely ignore it (otherwise, please copy the changes to conf to checkin!)

Please close this issue if it solved, or if you have suggestions I would like to listen :-)

buggtb commented 7 years ago

Cool. Personally I think it would be much more convenient to have a --config-dir type flag on the command line or something but this will do for me! (maybe add it to a wiki page somewhere)