Sotera / watchman

Watchman: An open-source social-media event-detection system
GNU General Public License v2.0
20 stars 7 forks source link

Load test data with pyspark #98

Closed lukewendling closed 7 years ago

lukewendling commented 7 years ago

Use Spark 2.0 docker containers (1 master/ 1 slave) to encapsulate a simple py script that loads data directly into mongo (loopy not a good candidate) for ingesting fairly large datasets (> 10M posts).

drJAGartner commented 7 years ago

I looked over what you had, please commit and we should be good.

lukewendling commented 7 years ago

accidentally merged to 88 branch after it was itself merged to another. let's leave it in 88 branch for later use, if needed.