Mostly solves #5 except need to add better regular expressions for nonstandard emails, and parse multiple emails from a posting instead of just the first email.
Also added a detailed readme to help people run ht-etl on Windows 10.
This uses the crawler database from ht-archive, not sure if that is exactly the same as what ht-etl is intended for.
Mostly solves #5 except need to add better regular expressions for nonstandard emails, and parse multiple emails from a posting instead of just the first email.
Also added a detailed readme to help people run ht-etl on Windows 10.
This uses the crawler database from ht-archive, not sure if that is exactly the same as what ht-etl is intended for.