aarddict / tools

Tools for Aard Dictionary
GNU General Public License v3.0
14 stars 13 forks source link

problem when compiling Turkish wiktionary #24

Open itkach opened 11 years ago

itkach commented 11 years ago

Copied from aarddict/desktop#26 Hello OS: Ubuntu 12.04 i386 After compiling tr.wikrionary.org there are no word definitions in articles, only information like language, tense, etc. Example: Original article:

ffordd Galce Ad Anlamlar [1] yol

My case (from aarddict):

ffordd Galce Ad

as you see, there is no translation I've followed instructions here http://aarddict.org/aardtools/doc/aardtools.html, except that 1) installed libicu48 instead of libicu38 2) had to give executable permission to env-aard/bin/activate 3) simplewiki-20101026-pages-articles.cdb is not a file, but a folder 4) had lots of these messages during aardc wiki ... execution:

.../env-aard/local/lib/python2.7/site-packages/aardtools/mwaardhtmlwriter.py:356: FutureWarning: The behavior of this method will change in future versions.  Use specific 'len(elem)' or 'elem is not None' test instead.
  not (element.getchildren() or element.text or element.tail) and parent):

thank you

itkach commented 11 years ago

@microspace

After compiling tr.wikrionary.org there are no word definitions in articles, only information like language, tense, etc.

Current version of aardtools filters out navigational cruft, mostly based on enwiki and few other big wikis, which may not be suitable for other types of wikis (take a look aardtools/mwaardhtmlwriter.py to see what's excluded). I made some changes to compile enwiktionary, although this is a hack and won't be merged.

The proper way to fix this will be available after I merge https://github.com/aarddict/tools/pull/21 which basically fixes #11 and enables creating individual filter sets per wikipedia, outside of the code (sorry @doozan it's taking so long - I almost have it done, just need to find time to clean up and release).

2) had to give executable permission to env-aard/bin/activate

Take that up with virtualenv, not part of this project

3) simplewiki-20101026-pages-articles.cdb is not a file, but a folder

I don't see where it is claimed to be a file (and technically directories are files anyway)

4) had lots of these messages during aardc wiki ... execution:

13, I believe this is fixed in a9b192a7ce0b0cc00cce06f310e359dc58ecf27c

itkach commented 11 years ago

It looks like content filtering is not an issue - in latest version of aardtools nothing is filtered out by default, yet translations are still missing. Other wiktionaries compile fine though, it looks like there's something special about trwiktionary that mwlib doesn't properly handle.