dstl / baleen

Entity Extraction Text Processor
Apache License 2.0
148 stars 40 forks source link

Add option to convert numbers preceded by + to String object #27

Closed ghost closed 8 years ago

ghost commented 8 years ago

The StringToObject class attempts to convert a string to a Java Object of the correct class. It contains a configuration parameter that can be used to enable or disable the treatment of numbers preceded by a 0 as a special case. When the option is enabled (the default) such numbers are converted to String objects rather than converting to an Integer as they are assumed to be a telephone number.

This pull request extends the special case to include numbers preceded by a + character as this combination can also indicate the number sequence is a phone number. A new configuration parameter has been added to enable this special case. When enabled, a numeric value preceded by a + character (e.g. +120255566) is kept as a string and converted to a String object rather than an Integer.

The default mode of operation is to convert numbers with a preceding + into a positive numeric value and a numeric object (e.g. Integer). This retains the functionality of the existing released code. The special case processing only occurs if the configuration key (precedingPlusIsntNumber) is supplied and set to true.