TIBCOSoftware / jasperreports

JasperReports® - Free Java Reporting Library
https://community.jaspersoft.com/downloads/community-edition/
GNU Lesser General Public License v3.0
1.06k stars 404 forks source link

Create BreakIterator with report locale #280

Open digulla opened 2 years ago

digulla commented 2 years ago

Currently, BreakIterator in SimpleTextLineWrapper and ComplexTextLineWrapper is created using

BreakIterator.getCharacterInstance()

instead of

BreakIterator.getCharacterInstance(Locale)

The former uses the VMs default locate while the latter uses the supplied one which should be filled with the value from JRParameter.REPORT_LOCALE. This can cause problems with scripts like Italian which use U+2019 as apostrophe while English uses U+0027. So even through the report locale it Locale.ITALIAN, it will split d’investimento into two words.

Test case: After creating a BreakIterator with ITALIAN, the text una strategia d’investimento should be three words and two split positions (after una and after strategia).

https://github.com/TIBCOSoftware/jasperreports/blob/master/jasperreports/src/net/sf/jasperreports/engine/fill/ComplexTextLineWrapper.java#L106

https://github.com/TIBCOSoftware/jasperreports/blob/master/jasperreports/src/net/sf/jasperreports/engine/fill/SimpleTextLineWrapper.java

digulla commented 2 years ago

Alternatively, allow to supply a factory for BreakIterator via a report parameter. That would allow to inject custom solutions or the BreakIterator from ICU4J.

dadza commented 2 years ago

Using java.text.BreakIterator.getLineInstance (getCharacterInstance is only called when truncating the last line inside a word) with Locale.ITALIAN doesn't seem to work in this case, it still breaks after U+2019. Still, I assume there are cases when using the report locale for java.text.BreakIterator.getLineInstance would make a difference.

ICU4J's BreakIterator.getLineInstance (with either Locale.ITALIAN or Locale.ENGLISH) does work. As a note, using ICU4J's BreakIterator would involve creating an adapter to java.text.BreakIterator as we use the break iterator with java.awt.font.LineBreakMeasurer.

Allowing custom implementations does of course make sense.

dadza commented 1 year ago

FWIW you can use ICU4J as Java locale provider by including icu4j and icu4j-localespi jars in your classpath and adding -Djava.locale.providers=SPI,CLDR,COMPAT when launching the Java process.