Multiprocessing ability with Apache spark

Once you get to the main xml content of the wikidump transforming the xml into json can get a severe speed up by running on spark. This has already been done at the idio fork of this repo, so this pr severs as as basis for introducing this https://github.com/idio/json-wikipedia/pull/3/files. A few pointers:

since the forks have diverged severely, it's easier to start a new pr (from a branch)
use latest spark
benchmark with the simple wikipedia

diegoceccarelli / json-wikipedia

Multiprocessing ability with Apache spark #46