dkpro / dkpro-jwktl

Java Wiktionary Library
http://dkpro.org/dkpro-jwktl/
Apache License 2.0
57 stars 26 forks source link

About 2017 dump English Wiktionary #38

Closed rabravo closed 7 years ago

rabravo commented 7 years ago

Hi there, I am trying the library with the latest Wiktionary dump from 2017 and I get an XML parse error. I realize one user reported working with dumps from 2013 (Google groups) with no problem. Do you keep a record of what Wiktionary can be parsed with your libraries without errors?

Thank you in advance for any help.

chmeyer commented 7 years ago

Tried the German dump from 2017-02-01 with the current head version, which worked fine.

The corresponding English dump from 2017-02-01 is currently running. Do you happen to have the exact error message/stack trace you get? Which version died you try (latest vs. release)?

chmeyer commented 7 years ago

Tried English dump from 2017-02-01 as well with the current head version. Parsing and accessing works without errors at my computer. Please let us know the exact error message, stack trace, OS and Java version by reopening this issue (if the error remains). Maybe there is a conflict in the libraries or something old in Maven's local repository?

rabravo commented 7 years ago

I will report back later today with the information, Thanks.

rabravo commented 7 years ago

Dear chmeyer,

I followed your instructions. I went on to get the project via git clone command line and imported it into Eclipse. Try the sample from the website, https://dkpro.github.io/dkpro-jwktl/documentation/getting-started/ using the following dump: enwiki-20161001-pages-articles-multistream.xml.bz2 and it is working really smooth right now ... wait ... there is a report that of Parsed 25,000 pages! Now is in the 100,000 wow! Well, I guess I will wait to the Parser to finish. You're my hero! Cheers to you and sorry for the trouble.

rabravo commented 7 years ago

The process reached the maximum number of possible entries. I wonder how to limit the number of parsed pages. Also, I would like to have access to the Spanish translations of English words. What is the best way to approach this. Following the example from the website,

https://dkpro.github.io/dkpro-jwktl/documentation/use-cases/

under the example Extracting translations I notice that Language.GERMAN determines what translation to use. Russian is also available but my question is how to set another language such as Spanish. Any help would be greatly appreciated ... again.

chmeyer commented 7 years ago

I assume this is related to #40.