Open jhoetter opened 1 year ago
I was building an extractor recently, and this brick can definitely be extended.
E.g., see this code:
import re
import json
def extract_currency_values(text):
# Regex patterns for currency symbols and ISO codes
currency_symbols = {
"usd": r"\$[ ]*([\d,]+\.?\d*)",
"eur": r"€[ ]*([\d,]+\.?\d*)",
"cad": r"C\$[ ]*([\d,]+\.?\d*)",
}
currency_codes = {
"usd": r"USD[ ]*([\d,]+\.?\d*)",
"eur": r"EUR[ ]*([\d,]+\.?\d*)",
"cad": r"CAD[ ]*([\d,]+\.?\d*)",
}
results = []
# Check for matches using currency symbols
for currency, pattern in currency_symbols.items():
matches = re.findall(pattern, text, re.IGNORECASE)
for match in matches:
results.append({
"currency": currency.upper(),
"value": match.replace(",", "")
})
# Check for matches using currency codes
for currency, pattern in currency_codes.items():
matches = re.findall(pattern, text, re.IGNORECASE)
for match in matches:
results.append({
"currency": currency.upper(),
"value": match.replace(",", "")
})
return json.dumps(results, indent=4)
# testing the function
print(extract_currency_values("USD 100,000, CAD100000, EUR100.000, 45€"))
In this case, i also grab the USD, CAD, EUR values as well. @LeonardPuettmannKern
Description I want to find prices in texts, e.g.
"This notebook costs 2$"
. This module will output"2$"
. We make use of the spacy entity labelling to find entities that are labelled asMONEY
.Implementation
Additional information
pip install -U spacy
.en_core_web_lg
). This can be installed by runningpython -m spacy download en_core_web_lg
on the terminal.