Closed rom1504 closed 9 years ago
Have you tested your hadoop installation?. It's have been a while since I built the database from newer dumps but always have problems with hadoop.
Yes, the first example (Standalone Operation) on http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html works.
Should I use hadoop 2.6 or 1.2 ?
You are using then hadoop 2.6 while Wikiminer use the 1.2 version. Maybe the problem is there. I followed this guide to setup hadoop on my linux machine and the last time it worked well. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Using hadoop 1.2 seems to fix that problem, thanks !
I'm trying to apply https://github.com/dnmilne/wikipediaminer/wiki/Obtaining-wikipedia-data on the simple english dump and I'm getting these errors :
I'm using wikipedia miner 1.2 and hadoop 2.6 (and java7). with that command line
hadoop jar wikipedia-miner-hadoop.jar org.wikipedia.miner.extraction.DumpExtractor input/simplewiki-latest-pages-articles.xml input/languages.xml simple input/en-sent.bin output
Any particular reason why I'm getting this error ?