This branch contains some minor regex updates from the CCD-DE project. It consists of two separate main parts:
CleanPunctuation.java can now accept a dash ("-") at either the beginning or end of an entity. An example of where this is useful is in accepting negative values such as -$30.00.
TimeRegex.java now only accepts times between 0:00 and 24:59:59. Both 0 and 24 are accepted for midnight. This class can now recognise times separated by dots (".") as well as colons (":"). Additionally, xxxxh and xxxxhours notation is now accepted, so "0700h" will be recognised as a time entity. Changes to this class honour DSTL's changes to this class for Baleen 2.1.
This branch contains some minor regex updates from the CCD-DE project. It consists of two separate main parts: