UC-MACSS / persp-analysis_A18

Perspectives on Computational Analysis (MACS 30000), Autumn 2018
31 stars 49 forks source link

Filter out invalid phone numbers #22

Closed liu431 closed 5 years ago

liu431 commented 5 years ago

I notice there are some websites where you can validate the phone numbers. This could be useful for filtering out invalid and out-of-service numbers. I used one called "numverify". When you type in a number, it will provide a JSON file containing the carrier, validate or not, location, etc. In order to validate 200 numbers, I wrote a simple function to scrape the JSON files and added a label to the numbers. Check out this if you have not called yet. Cheers!

smiklin commented 5 years ago

Thanks for this @liu431 ! I wrote this up in R— find it here if you're interested.

liu431 commented 5 years ago

@smiklin Great! How does this work for your numbers? When I just conditioned on "valid"==true, only 15 are filtered out. Then I looked at these JSON lists again and added more conditions that carriers, locations, line_type are not empty. Then 140 of them are invalid.

smiklin commented 5 years ago

Only a few got three that were "valid" == FALSE, and those are the three that have empty 'line_type'. If carrier/location are empty though, does that mean an invalid number, or is it simply unknown/missing data?

liu431 commented 5 years ago

From reading the JSON Formatting part of the documentation, I think the ones with missing information are likely to be fake (without identity authentication). I tried dialing some and none was successful.

smiklin commented 5 years ago

Oh, that is useful! I might have to get a new key and re-run it. I decided not to call over the weekend so I'll see tomorrow!

Edit: I just ran it again, making sure to pull the carrier and the location info. Here is a representative snapshot: image It looks like the carrier is empty for all landlines, so you might've ended up with a list of mobile phones only? I'll see tomorrow

smiklin commented 5 years ago

Hah, turns out the majority of numbers are not in service, and that is not something registered by numverify... Still it was good practice writing the code :-)