dbpedia / extraction-framework

The software used to extract structured data from Wikipedia
856 stars 269 forks source link

Download does not work #281

Closed Bini12 closed 9 years ago

Bini12 commented 9 years ago

I try to download Amharic Language but I am experiencing problems with the download operation. I listed a lines I use in my 'download.minimal.property' file.

base-url=http://dumps.wikimedia.org/ base-dir=/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps download=am:@pages-articles\d+.xml.bz2 download=wikidata:pages-articles.xml.bz2 download=commons:pages-articles.xml.bz2 unzip=true retry-max=5 retry-millis=10000

but I get error and I cant figure out by myself. Is there anyone who can help me please? . . dump $ ../clean-install-run download config=download.minimal.properties ... java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala_maven_executions.MainHelper.runMain(MainHelper.java:164) at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26) Caused by: java.lang.Exception: found no date on http://dumps.wikimedia.org/amwiki/ in range 00000000-99999999 with files (pages-articles\d*.xml,true),(pages-articles\d+.xml.bz2,true) at org.dbpedia.extraction.dump.download.LanguageDownloader.downloadDates(LanguageDownloader.scala:49) at org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download.scala:69) at org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download.scala:64) at scala.collection.immutable.RedBlackTree$.foreachKey(RedBlackTree.scala:84) at scala.collection.immutable.RedBlackTree$.foreachKey(RedBlackTree.scala:83) at scala.collection.immutable.TreeSet.foreach(TreeSet.scala:151) at org.dbpedia.extraction.dump.download.Download$.main(Download.scala:64) at org.dbpedia.extraction.dump.download.Download.main(Download.scala) ... 6 more [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12.762s [INFO] Finished at: Tue Nov 18 16:29:39 EAT 2014 [INFO] Final Memory: 10M/147M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.6:run (default-cli) on project dump: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 240 (Exit value: 240) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

jimkont commented 9 years ago

Looks like the pages-article.xml.bz2 file is not yet generated in the latest folder http://dumps.wikimedia.org/enwiki/20141106/

Can you wait a few hours until it does, try again and report back?

Thanks, Dimitris

Bini12 commented 9 years ago

Thanks Dimitris for your quick reply. I can wait and report back but to remind you I need Amharic language not English which mean amwiki instead of enwiki in http://dumps.wikimedia.org/amwiki/20141106/

Thanks, Bini

jcsahnwaldt commented 9 years ago

@jimkont , I think it's a different problem. The downloader should check all dates, not just the latest, and choose the last one for which the desired files are available.

@Bini12 , Could you please post the complete log? Or at least the last 20 to 100 lines before the exception. There should be several messages like date page 'http://dumps.wikimedia.org/amwiki/.../' has no links ...

Bini12 commented 9 years ago

[INFO] Building DBpedia Dump Extraction 4.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] >>> scala-maven-plugin:3.1.6:run (default-cli) @ dump >>> [INFO] [INFO] --- maven-enforcer-plugin:1.3.1:enforce (default) @ dump --- read 13.887695 KB of 13.887695 KB in 0.019 seconds (730.9313 KB/s) date page 'http://dumps.wikimedia.org/amwiki/20140704/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140612/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140612/index.html' read 13.892578 KB of 13.892578 KB in 0.004 seconds (3.3917425 MB/s) date page 'http://dumps.wikimedia.org/amwiki/20140612/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140525/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140525/index.html' read 13.893555 KB of 13.893555 KB in 0.01 seconds (1.3567923 MB/s) date page 'http://dumps.wikimedia.org/amwiki/20140525/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140508/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140508/index.html' read 13.893555 KB of 13.893555 KB in 0.007 seconds (1.9382747 MB/s) date page 'http://dumps.wikimedia.org/amwiki/20140508/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140420/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140420/index.html' read 12.697266 KB of 12.697266 KB in 0.015 seconds (846.4843 KB/s) date page 'http://dumps.wikimedia.org/amwiki/20140420/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140401/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140401/index.html' read 12.691406 KB of 12.691406 KB in 0.001 seconds (12.39395 MB/s) date page 'http://dumps.wikimedia.org/amwiki/20140401/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140317/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140317/index.html' read 12.696289 KB of 12.696289 KB in 0.009 seconds (1.3776355 MB/s) date page 'http://dumps.wikimedia.org/amwiki/20140317/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140301/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140301/index.html' read 12.6953125 KB of 12.6953125 KB in 0.006 seconds (2.0662942 MB/s) date page 'http://dumps.wikimedia.org/amwiki/20140301/ has no links that satisfies [pages-articles\d+.xml.bz2] java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala_maven_executions.MainHelper.runMain(MainHelper.java:164) at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26) Caused by: java.lang.Exception: found no date on http://dumps.wikimedia.org/amwiki/ in range 00000000-99999999 with files (pages-articles.xml.bz2,false),(pages-articles\d+.xml.bz2,true) at org.dbpedia.extraction.dump.download.LanguageDownloader.downloadDates(LanguageDownloader.scala:49) at org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download.scala:69) at org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download.scala:64) at scala.collection.immutable.RedBlackTree$.foreachKey(RedBlackTree.scala:84) at scala.collection.immutable.RedBlackTree$.foreachKey(RedBlackTree.scala:83) at scala.collection.immutable.TreeSet.foreach(TreeSet.scala:151) at org.dbpedia.extraction.dump.download.Download$.main(Download.scala:64) at org.dbpedia.extraction.dump.download.Download.main(Download.scala) ... 6 more [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 11.454s [INFO] Finished at: Mon Nov 17 18:57:36 EAT 2014 [[INFO] [INFO] --- maven-resources-plugin:2.3:resources (default-resources) @ dump --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /home/binyam/Desktop/dbpedia/extraction-framework/dump/src/main/resources [INFO] [INFO] --- scala-maven-plugin:3.1.6:compile (process-resources) @ dump --- [INFO] Nothing to compile - all classes are up to date [INFO] [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ dump --- [INFO] No sources to compile [INFO] [INFO] --- scala-maven-plugin:3.1.6:compile (compile) @ dump --- [INFO] Nothing to compile - all classes are up to date [INFO] [INFO] --- maven-resources-plugin:2.3:testResources (default-testResources) @ dump --- [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /home/binyam/Desktop/dbpedia/extraction-framework/dump/src/test/resources [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ dump --- [INFO] No sources to compile [INFO] [INFO] --- scala-maven-plugin:3.1.6:testCompile (test-compile) @ dump --- [INFO] No sources to compile [INFO] [INFO] <<< scala-maven-plugin:3.1.6:run (default-cli) @ dump <<< [INFO] [INFO] --- scala-maven-plugin:3.1.6:run (default-cli) @ dump --- [INFO] launcher 'download' selected => org.dbpedia.extraction.dump.download.Download INFO] Final Memory: 11M/173M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.6:run (default-cli) on project dump: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 240 (Exit value: 240) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException done: 0 - todo: 3 - am,commons,wikidata downloading 'http://dumps.wikimedia.org/amwiki/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/index.html' read 3.4609375 KB of 3.4609375 KB in 0.004 seconds (865.2343 KB/s) downloading 'http://dumps.wikimedia.org/amwiki/20141101/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20141101/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20141101/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20141101/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20141101/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20141005/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20141005/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20141005/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20141005/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20141005/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140909/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140909/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140909/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140909/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140909/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140820/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140820/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140820/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140820/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140820/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140730/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140730/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140730/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140730/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140730/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140704/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140704/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140704/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140704/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140704/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140612/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140612/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140612/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140612/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140612/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140525/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140525/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140525/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140525/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140525/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140508/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140508/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140508/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140508/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140508/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140420/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140420/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140420/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140420/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140420/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140401/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140401/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140401/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140401/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140401/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140317/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140317/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140317/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140317/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140317/ has no links that satisfies [pages-articles\d+.xml.bz2] downloading 'http://dumps.wikimedia.org/amwiki/20140301/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140301/index.html' did not download 'http://dumps.wikimedia.org/amwiki/20140301/' to '/home/binyam/Desktop/dbpedia/extraction-framework/dump/dumps/amwiki/20140301/index.html' - file is up to date date page 'http://dumps.wikimedia.org/amwiki/20140301/ has no links that satisfies [pages-articles\d+.xml.bz2] java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at scala_maven_executions.MainHelper.runMain(MainHelper.java:164) at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26) Caused by: java.lang.Exception: found no date on http://dumps.wikimedia.org/amwiki/ in range 00000000-99999999 with files (pages-articles.xml.bz2,false),(pages-articles\d+.xml.bz2,true) at org.dbpedia.extraction.dump.download.LanguageDownloader.downloadDates(LanguageDownloader.scala:49) at org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download.scala:69) at org.dbpedia.extraction.dump.download.Download$$anonfun$main$3.apply(Download.scala:64) at scala.collection.immutable.RedBlackTree$.foreachKey(RedBlackTree.scala:84) at scala.collection.immutable.RedBlackTree$.foreachKey(RedBlackTree.scala:83) at scala.collection.immutable.TreeSet.foreach(TreeSet.scala:151) at org.dbpedia.extraction.dump.download.Download$.main(Download.scala:64) at org.dbpedia.extraction.dump.download.Download.main(Download.scala) ... 6 more [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 11.087s [INFO] Finished at: Mon Nov 17 19:03:55 EAT 2014 [INFO] Final Memory: 11M/174M [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.1.6:run (default-cli) on project dump: wrap: org.apache.commons.exec.ExecuteException: Process exited with an error: 240 (Exit value: 240) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

jcsahnwaldt commented 9 years ago

Just change this line in your config

download=am:@pages-articles\d+.xml.bz2

to

download=am:pages-articles.xml.bz2

and the script should download the latest dump from https://dumps.wikimedia.org/amwiki/20141101/

Bini12 commented 9 years ago

Thanks Christopher, now it works.