dbpedia-spotlight / pignlproc

Apache Pig utilities to build training corpora for machine learning / NLP out of public Wikipedia and DBpedia dumps.
17 stars 14 forks source link

Running nerd-stats on part of the Wikipedia dump #14

Open Nasreddine opened 9 years ago

Nasreddine commented 9 years ago

I've tried locally to run nerd-stats.pig script on part of Wikipedia dump http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2, I got intended statistics. But when I tried to run the same script on part of the above wiki dump, no results were available. Does the script require minimum amount of data ?

tilneyyang commented 9 years ago

I'm stuck with the same problem on named entity extraction script. Everything works fine with hadoop. But the output folder is not created...

tilneyyang commented 9 years ago

it might be the problem of the OUTPUT path you set. I tried with local dir, no luck. Then with a hdfs path and got the final results. I'm not familiar with hadoop or pig, hope someone can figure it out...

Nasreddine commented 9 years ago

In my case the output folder is created, but no results were created. How did you set the hdfs path ?

tilneyyang commented 9 years ago

The full hdfs path /user/username/outputdir. Also please do try hadoop0.20.0 with the script, otherwise there might also be unexpected problems cause by hadoop version.