anidata / ht-etl

Anidata 1.0: ETL and algorithm code.
0 stars 10 forks source link

Mostly solves Issue to make Luigi task to parse emails #6

Closed lahoffm closed 7 years ago

lahoffm commented 7 years ago

Mostly solves #5 except need to add better regular expressions for nonstandard emails, and parse multiple emails from a posting instead of just the first email.

Also added a detailed readme to help people run ht-etl on Windows 10.

This uses the crawler database from ht-archive, not sure if that is exactly the same as what ht-etl is intended for.

dlrobertson commented 7 years ago

:+1: thanks!