Closed daxenberger closed 9 years ago
You are right to be confused.
A bad case of c&p gone wrong.
Thanks for spotting that.
I will fix that so that POS ngram is going to work again as expected.
The issue with "character cannot be phonetized" should however be fixed as well.
Do you have an example that causes this exception.
Reported by torsten.zesch
on 2014-05-24 09:03:58
Started
Thx. Most likely it was either #, @ or a smiley, given that it was reading tweets. I'll
check that out once my current pipeline finishes.
Reported by l.flekova
on 2014-05-24 09:51:58
I tried with a bunch of special characters and they all worked fine.
So it would be good to have the bad string in order to help me reproduce the problem.
Reported by torsten.zesch
on 2014-05-24 18:46:48
Okay, so it is this character: ʉ in this sequence: �ʉ�_ which happens to be present
in some of the hyperlinks. Probably I messed up some escaped character sequence in
the data, so the error is between the chair and the laptop ;) For normal characters
it should be fairly failsafe :-)
Caused by: java.lang.IllegalArgumentException: The character is not mapped: Ʉ
at org.apache.commons.codec.language.Soundex.map(Soundex.java:226)
at org.apache.commons.codec.language.Soundex.getMappingCode(Soundex.java:180)
at org.apache.commons.codec.language.Soundex.soundex(Soundex.java:264)
at org.apache.commons.codec.language.Soundex.encode(Soundex.java:162)
at de.tudarmstadt.ukp.dkpro.tc.features.ngram.util.NGramUtils.getDocumentPhoneticNgrams(NGramUtils.java:167)
It happens in the MetaTask which
Reported by l.flekova
on 2014-05-24 20:06:34
Fixed
I tested with those characters and I got no mapping errors here.
So we will leave that issue closed until someone runs into the same problem again :)
Reported by torsten.zesch
on 2014-05-24 20:10:13
Reported by daxenberger.j
on 2014-06-13 15:18:04
Originally reported on Google Code with ID 133
Reported by
l.flekova
on 2014-05-23 21:13:27