What version of the product are you using? On what operating system?
Relates to version currently browsable in google code. Linux
Issue 1.
boolean isInvStartsWith defined on line 418 is not used in any case, can be
removed.
This 'isInvStartsWith' cannot happen as we always input the last segment of the
compound as aSplit to the method - so either aSplit.startsWith(rest) or
aSplit.startsWith(restGrund)
What steps will reproduce the problem?
1. comment out isInvStartsWith
2. test on a bunch of examples
3. no difference in behavior
What is the expected output? What do you see instead?
Output is fine, this is a sanity issue. If that variable has a role, I
misunderstood the code completely.
Issue 2.
in line 416, isEqual should be:
boolean isEqual = /*aSplit.equals(restGrund) ||*/ aSplit.equals(rest);
i.e. not consider the 'equals restGrund' case.
This way, the last part of the compound is never lemmatized, which is, if
desired then this is a non issue, but I find it counter intuitive (as typically
the last part of the noun is what gets inflected...).
Sometimes, equality check also prevents reducing the inner part (see 2nd
example below)
What steps will reproduce the problem?
1. comment out the part above
2. test on a bunch of examples
3. difference in behavior that last part of the compound gets lemmatized. I
think this is desirable, and the inflection can be dropped entirely (as it is
not a linking morpheme that should be annotated, but a standard inflection).
What is the expected output? What do you see instead?
INPUT DESIRED (in my opinion) OBSERVED
1. Bankdienstleistungen Bank+dienst+leistung Bank+dienst+leistungen
2. Fußbodenschleifmaschinenverleih Fuß+boden+schleif+maschine+(n)+verleih Fuß
+boden+schleif+maschinen+verleih
3. Halsschmerzen Hals+schmerz Hals+schmerzen
4. Klimaschutzzielen Klima+schutz+ziel Klima+schutz+zielen
5. Kopfschmerzen Kopf+schmerz Kopf+schmerzen
Issue 3.
If Issue 2 is approved and changed, this surfaces a bug in line 436 (which was
not an issue before, when last part does not get lemmatized).
Namely, that this line assumes that the reduced (lemma) form is always strictly
shorter or equal length as the inflected form. This is not always true, see
below.
What steps will reproduce the problem?
1. Implement the change suggested in Issue 2, i.e. remove equals(restGrund)
check.
2. test with "Betriebsmodi"
3. substring throws a StringIndexOutOfBoundsException
What is the expected output? What do you see instead?
Betriebsmodi Betrieb+(s)+modus
isntead: exception thrown.
Fix: add a check around line 436:
//there is something at the end, this is not true for irregular cases where
//inflected form gets shortened: "modus" --> "modi" (plural)
if (rest.length() > restGrund.length()) {
retvec.add("(" + rest.substring(restGrund.length()) + ")");
}
Original issue reported on code.google.com by szarv...@amazon.de on 22 Dec 2014 at 11:03
Original issue reported on code.google.com by
szarv...@amazon.de
on 22 Dec 2014 at 11:03