Open GoogleCodeExporter opened 9 years ago
Hey, and thanks for the issue report.
I've run your command with your sample on my end (Ubuntu 14.04) and it seems to
work fine; "12 tag" later in that line is recognized correctly and no error is
spat out.
As the problem occurs in the TreeTaggerWrapper component: Could you check if
your TreeTagger/parameter files' integrity is okay, e.g. by re-downloading
them? The URLs are here:
http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
If the problem still occurs, please run something akin to:
cat lill_sample.txt | ./tree-tagger-german > out.txt
from the treetagger/cmd folder and attach the output file here? Thanks!
Original comment by z...@informatik.uni-heidelberg.de
on 28 Jul 2014 at 5:36
Please find the out.txt attached.
Thank you for trying to help)
Original comment by natak...@gmail.com
on 28 Jul 2014 at 6:02
Attachments:
Okay, that looks good. I'm not sure right now what could be causing this.
could you please give me the outputs of:
1. locale
and
2. locale -a
? Thanks.
Original comment by z...@informatik.uni-heidelberg.de
on 29 Jul 2014 at 12:37
Here they are:
/cmd$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=de_CH.UTF-8
LC_TIME=de_CH.UTF-8
LC_COLLATE="en_US.UTF-8"
LC_MONETARY=de_CH.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=de_CH.UTF-8
LC_NAME=de_CH.UTF-8
LC_ADDRESS=de_CH.UTF-8
LC_TELEPHONE=de_CH.UTF-8
LC_MEASUREMENT=de_CH.UTF-8
LC_IDENTIFICATION=de_CH.UTF-8
LC_ALL=
/cmd$ locale -a
C
C.UTF-8
de_CH.utf8
en_AG
en_AG.utf8
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_IN.utf8
en_NG
en_NG.utf8
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZM
en_ZM.utf8
en_ZW.utf8
POSIX
Original comment by natak...@gmail.com
on 29 Jul 2014 at 5:29
Okay, those look good, too.
Honestly I'm a bit stumped, because if the input file is UTF-8 and the
system/java vm support UTF-8, and the TreeTagger output is good (all of which
appears to be the case), then this issue shouldn't occur.
Can you attach your actual sample file "lill_sample.txt" that fails to process?
And what are your Ubuntu and Java versions? (lsb_release -a && java -version)
Original comment by z...@informatik.uni-heidelberg.de
on 29 Jul 2014 at 9:41
cmd$ lsb_release -a && java -version
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.1 LTS
Release: 14.04
Codename: trusty
java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
Please find the input file attached.
Thank you))
Original comment by natak...@gmail.com
on 29 Jul 2014 at 9:54
Attachments:
Just to keep you updated, I've managed to reproduce this issue and will try to
fix it. Unfortunately I can't give you a workaround as of now.
I'll update this issue as soon as I have something.
Original comment by z...@informatik.uni-heidelberg.de
on 31 Jul 2014 at 10:01
This seems to be good news! Thanks! Waiting for your next update.
Original comment by natak...@gmail.com
on 31 Jul 2014 at 10:04
Original issue reported on code.google.com by
natak...@gmail.com
on 28 Jul 2014 at 2:27