my environment:
$ lsb_release -a
LSB Version: core-9.20160110ubuntu0.2-amd64:core-9.20160110ubuntu0.2-noarch:security-9.20160110ubuntu0.2-amd64:security-9.20160110ubuntu0.2-noarch
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
I followed the recommended steps to generate a database from a wikipedia articles dump. I run into the following java exception when trying to extract a links db from the Russian wikipedia articles dump:
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2.
2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will tr
y to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c ', or link
it with '-z noexecstack'.
16/11/06 21:12:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
16/11/06 21:12:28 INFO extraction.DumpExtractor: Extracting site info
16/11/06 21:12:28 INFO extraction.DumpExtractor: Starting page step
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.textoutputformat.separator is
deprecated. Instead, use mapreduce.output.textoutputformat.separator
16/11/06 21:12:28 INFO Configuration.deprecation: session.id is deprecated. Instead, us
e dfs.metrics.session-id
16/11/06 21:12:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTra
cker, sessionId=
16/11/06 21:12:28 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=J
obTracker, sessionId= - already initialized
16/11/06 21:12:28 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not p
erformed. Implement the Tool interface and execute your application with ToolRunner to
remedy this.
16/11/06 21:12:28 INFO mapred.FileInputFormat: Total input paths to process : 1
16/11/06 21:12:28 INFO mapreduce.JobSubmitter: number of splits:519
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.job.name is deprecated. Instea
d, use mapreduce.job.name
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.cache.files.timestamps is depr
ecated. Instead, use mapreduce.job.cache.files.timestamps
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.input.dir is deprecated. Inste
ad, use mapreduce.input.fileinputformat.inputdir
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.output.value.class is deprecat
ed. Instead, use mapreduce.job.output.value.class
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, us
e mapreduce.job.jar
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.output.dir is deprecated. Inst
ead, use mapreduce.output.fileoutputformat.outputdir
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.cache.files is deprecated. Ins
tead, use mapreduce.job.cache.files
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.working.dir is deprecated. Ins
tead, use mapreduce.job.working.dir
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Inste
ad, use mapreduce.job.maps
16/11/06 21:12:28 INFO Configuration.deprecation: user.name is deprecated. Instead, use
mapreduce.job.user.name
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. In
stead, use mapreduce.job.reduces
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.cache.files.filesizes is depre
cated. Instead, use mapreduce.job.cache.files.filesizes
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.output.key.class is deprecated
. Instead, use mapreduce.job.output.key.class
16/11/06 21:12:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1121
545400_0001
16/11/06 21:12:28 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/staging/beil112
1545400/.staging/job_local1121545400_0001/job.xml:an attempt to override final paramete
r: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/11/06 21:12:28 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/staging/beil112
1545400/.staging/job_local1121545400_0001/job.xml:an attempt to override final paramete
r: mapreduce.job.end-notification.max.attempts; Ignoring.
16/11/06 21:12:29 INFO mapred.LocalDistributedCacheManager: Localized file:/playground/
franz/projects/WPS/output/final/siteInfo.xml as file:/tmp/hadoop-beil/mapred/local/1478
463148986/siteInfo.xml
16/11/06 21:12:29 INFO mapred.LocalDistributedCacheManager: Localized file:/playground/
franz/projects/WPS/input/languages.xml as file:/tmp/hadoop-beil/mapred/local/1478463148
987/languages.xml
16/11/06 21:12:29 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
16/11/06 21:12:29 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/local/localRunner/beil/job_local1121545400_0001/job_local1121545400_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/11/06 21:12:29 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/local/localRunner/beil/job_local1121545400_0001/job_local1121545400_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
16/11/06 21:12:29 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/11/06 21:12:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/06 21:12:29 INFO mapreduce.Job: Running job: job_local1121545400_0001
16/11/06 21:12:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/11/06 21:12:29 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/06 21:12:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1121545400_0001_m_000000_0
16/11/06 21:12:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/11/06 21:12:29 INFO mapred.MapTask: Processing split: file:/playground/franz/projects/WPS/input/ruwiki-latest-pages-articles.xml:0+33554432
16/11/06 21:12:29 INFO mapred.MapTask: numReduceTasks: 1
16/11/06 21:12:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/11/06 21:12:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/11/06 21:12:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/11/06 21:12:29 INFO mapred.MapTask: soft limit at 83886080
16/11/06 21:12:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/11/06 21:12:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/11/06 21:12:29 ERROR extraction.PageStep$Step1Mapper: Could not configure mapper
java.io.FileNotFoundException: file:/tmp/hadoop-beil/mapred/local/1478463148986/siteInfo.xml (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at java.io.FileReader.(FileReader.java:58)
at org.wikipedia.miner.extraction.SiteInfo.(SiteInfo.java:30)
at org.wikipedia.miner.extraction.PageStep$Step1Mapper.configure(PageStep.java:132)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/11/06 21:12:29 ERROR extraction.PageStep$Step1Mapper: Caught exception
java.lang.NullPointerException
at org.wikipedia.miner.extraction.PageStep$Step1Mapper.map(PageStep.java:168)
at org.wikipedia.miner.extraction.PageStep$Step1Mapper.map(PageStep.java:109)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
my environment: $ lsb_release -a LSB Version: core-9.20160110ubuntu0.2-amd64:core-9.20160110ubuntu0.2-noarch:security-9.20160110ubuntu0.2-amd64:security-9.20160110ubuntu0.2-noarch Distributor ID: Ubuntu Description: Ubuntu 16.04.1 LTS Release: 16.04 Codename: xenial
$ hadoop version Hadoop 2.2.0 Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768 Compiled by hortonmu on 2013-10-07T06:28Z Compiled with protoc 2.5.0 From source with checksum 79e53ce7994d1628b240f09af91e1af4 This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
I followed the recommended steps to generate a database from a wikipedia articles dump. I run into the following java exception when trying to extract a links db from the Russian wikipedia articles dump:
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /usr/local/hadoop-2. 2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will tr y to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c', or link
it with '-z noexecstack'.
16/11/06 21:12:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
16/11/06 21:12:28 INFO extraction.DumpExtractor: Extracting site info
16/11/06 21:12:28 INFO extraction.DumpExtractor: Starting page step
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.textoutputformat.separator is
deprecated. Instead, use mapreduce.output.textoutputformat.separator
16/11/06 21:12:28 INFO Configuration.deprecation: session.id is deprecated. Instead, us
e dfs.metrics.session-id
16/11/06 21:12:28 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTra
cker, sessionId=
16/11/06 21:12:28 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=J
obTracker, sessionId= - already initialized
16/11/06 21:12:28 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not p
erformed. Implement the Tool interface and execute your application with ToolRunner to
remedy this.
16/11/06 21:12:28 INFO mapred.FileInputFormat: Total input paths to process : 1
16/11/06 21:12:28 INFO mapreduce.JobSubmitter: number of splits:519
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.job.name is deprecated. Instea
d, use mapreduce.job.name
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.cache.files.timestamps is depr
ecated. Instead, use mapreduce.job.cache.files.timestamps
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.input.dir is deprecated. Inste
ad, use mapreduce.input.fileinputformat.inputdir
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.output.value.class is deprecat
ed. Instead, use mapreduce.job.output.value.class
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, us
e mapreduce.job.jar
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.output.dir is deprecated. Inst
ead, use mapreduce.output.fileoutputformat.outputdir
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.cache.files is deprecated. Ins
tead, use mapreduce.job.cache.files
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.working.dir is deprecated. Ins
tead, use mapreduce.job.working.dir
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Inste
ad, use mapreduce.job.maps
16/11/06 21:12:28 INFO Configuration.deprecation: user.name is deprecated. Instead, use
mapreduce.job.user.name
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. In
stead, use mapreduce.job.reduces
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.cache.files.filesizes is depre
cated. Instead, use mapreduce.job.cache.files.filesizes
16/11/06 21:12:28 INFO Configuration.deprecation: mapred.output.key.class is deprecated
. Instead, use mapreduce.job.output.key.class
16/11/06 21:12:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1121
545400_0001
16/11/06 21:12:28 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/staging/beil112
1545400/.staging/job_local1121545400_0001/job.xml:an attempt to override final paramete
r: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/11/06 21:12:28 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/staging/beil112
1545400/.staging/job_local1121545400_0001/job.xml:an attempt to override final paramete
r: mapreduce.job.end-notification.max.attempts; Ignoring.
16/11/06 21:12:29 INFO mapred.LocalDistributedCacheManager: Localized file:/playground/
franz/projects/WPS/output/final/siteInfo.xml as file:/tmp/hadoop-beil/mapred/local/1478
463148986/siteInfo.xml
16/11/06 21:12:29 INFO mapred.LocalDistributedCacheManager: Localized file:/playground/
franz/projects/WPS/input/languages.xml as file:/tmp/hadoop-beil/mapred/local/1478463148
987/languages.xml
16/11/06 21:12:29 INFO Configuration.deprecation: mapred.cache.localFiles is deprecated. Instead, use mapreduce.job.cache.local.files
16/11/06 21:12:29 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/local/localRunner/beil/job_local1121545400_0001/job_local1121545400_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
16/11/06 21:12:29 WARN conf.Configuration: file:/tmp/hadoop-beil/mapred/local/localRunner/beil/job_local1121545400_0001/job_local1121545400_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
16/11/06 21:12:29 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/11/06 21:12:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/11/06 21:12:29 INFO mapreduce.Job: Running job: job_local1121545400_0001
16/11/06 21:12:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
16/11/06 21:12:29 INFO mapred.LocalJobRunner: Waiting for map tasks
16/11/06 21:12:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1121545400_0001_m_000000_0
16/11/06 21:12:29 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/11/06 21:12:29 INFO mapred.MapTask: Processing split: file:/playground/franz/projects/WPS/input/ruwiki-latest-pages-articles.xml:0+33554432
16/11/06 21:12:29 INFO mapred.MapTask: numReduceTasks: 1
16/11/06 21:12:29 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/11/06 21:12:29 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/11/06 21:12:29 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/11/06 21:12:29 INFO mapred.MapTask: soft limit at 83886080
16/11/06 21:12:29 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/11/06 21:12:29 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/11/06 21:12:29 ERROR extraction.PageStep$Step1Mapper: Could not configure mapper
java.io.FileNotFoundException: file:/tmp/hadoop-beil/mapred/local/1478463148986/siteInfo.xml (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at java.io.FileInputStream.(FileInputStream.java:93)
at java.io.FileReader.(FileReader.java:58)
at org.wikipedia.miner.extraction.SiteInfo.(SiteInfo.java:30)
at org.wikipedia.miner.extraction.PageStep$Step1Mapper.configure(PageStep.java:132)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:425)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/11/06 21:12:29 ERROR extraction.PageStep$Step1Mapper: Caught exception
java.lang.NullPointerException
at org.wikipedia.miner.extraction.PageStep$Step1Mapper.map(PageStep.java:168)
at org.wikipedia.miner.extraction.PageStep$Step1Mapper.map(PageStep.java:109)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:235)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It seems someone had a similar problem with the Dutch wikipedia articles dump: http://pastebin.com/uhpXwnTi
I'd appreciate it very much if someone could tell me about a workaround.