MIND-Lab / OCTIS

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
MIT License
718 stars 102 forks source link

Add email filter #79

Closed stepgazaille closed 1 year ago

stepgazaille commented 1 year ago

Solution taken here which provide a testing bed here

Example of results:

[IN]
import re
EMAIL_PATTERN = "\S*@\S*\s?"

new_d = """My email is user@mail.com
2134@mail.org is my email
<123@123.cn>
"""

print(new_d)
new_d = re.sub(EMAIL_PATTERN, "", new_d)
print(new_d)

[OUT]
My email is user@mail.com
2134@mail.org is my email
<123@123.cn>

My email is is my email