Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
https://www.unstructured.io/
Apache License 2.0
9.24k stars 766 forks source link

Ml 384/whitespaces in cct #3747

Closed mariannaparzych closed 1 month ago

mariannaparzych commented 1 month ago

This ticket ensures that CCT metric will not be sensitive to differences in whitespace (including newline). All whitespaces in string are changed to single space " " in both GT and PRED before the metric is computed.

Additional changes in CHANGELOG due to auto-formatting.