anidata / ht-etl

Anidata 1.0: ETL and algorithm code.
0 stars 10 forks source link

Create Luigi Task to extract phone numbers #4

Closed bmenn closed 7 years ago

bmenn commented 7 years ago

From the requirements of #1, we need to be able to scrape out phone numbers of all manner from raw HTML. Would also suggest creating a PhoneNumber table with an auto increment ID, so phone number could be tracked across multiple sites.

Examples of phone number formats to handle (not exhaustive):

bmenn commented 7 years ago

@baronvonbadguy

Do you have any updates or need help on this issue?

omnunum commented 7 years ago

Oh mate I totally forgot I had this assigned. I've had some friends roll through this weekend so my schedule was a bit more hectic than I anticipated. I'll take a look into this tonight and see if I can get a PR up.

omnunum commented 7 years ago

After meeting with @danlrobertson it appears we will need a common "pull records from raw table" task that we can use to feed both this task and also #5 with

bmenn commented 7 years ago

@baronvonbadguy @danlrobertson

Anything I can help here?

omnunum commented 7 years ago

I can make a ticket/branch for that task, and then base the branch for this ticket off that.

gte620v commented 7 years ago

Feel free to give it a shot. I am backlogged with other work.

dlrobertson commented 7 years ago

Resolved by #12