The calculation in metrics.regex_rows() is not consistent with the documentation

https://github.com/georgianpartners/foreshadow/blob/c2c213e0009cfdcf0aa9df75f0a6cf4c983d7090/foreshadow/metrics.py#L184

Here, before the sum, we should get a 0 or 1 value for each row. But instead, we are getting the matched length for each row, which leads to a final score larger than 1. Here are the code the reproduce the issue:

import pandas as pd
from foreshadow.concrete import DollarFinancialCleaner

x = pd.DataFrame({'price': ['$3', '$5.0', '$5,000.00']})
financial_cleaner = DollarFinancialCleaner()
metric = financial_cleaner.metric_score(x)
print(metric)

The expected value is 1 but get 4.2 instead.

georgian-io-archive / foreshadow

The calculation in metrics.regex_rows() is not consistent with the documentation #161