USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Improve deployments for different architectures #198

Open buggtb opened 3 years ago

buggtb commented 3 years ago

Target users: Joe and his laptop, Mac, Windows and Linux how do we support these users?

Then when they want to scale up how do you deploy SCE alongside an existing Spark cluster, run the crawl in the cluster and get the output.

How do you make the best out of cloud services and deploy into AWS/GCE/Azure?