Closed lgaetz closed 12 years ago
I am beginning to think that it is get_url_contents which is failing here because the google results come in frames, it could be that get_url_contents is grabbing the contents of the wrong frame
For Truelocal.com.au I have a preliminary regex that seems to be working which I note here so I don't lose it:
$pattern = "/www\.truelocal\.com\.au\/business\/(.{3,40})\/.*$num1\+$num2\+$num3/";
with the phone number split into segments of 2,4,4. and currently this module splits landlines into 2,4,4 but mobile numbers are split 4,3,3 so mobile numbers will fail. It is not great but it actually works with is more than I can say for most of what I accomplished this afternoon.
for yellowpages.com.au, this regex is mostly working:
$pattern ="/<a href=\"\/url\?q=http:\/\/www\.yellowpages\.com\.au\/.+\">(.{2,30})[ ]-[ ].+Phone number .<b>".$num1.".[ ]".$num2."[ ]".$num3."/";
Better working regex for Truelocal: $pattern = "/www.truelocal.com.au\/business\/.+?\">(.+?),[ ].+?\/.*$num1+$num2+$num3/";
I have always been weak at structuring (even reading) regex's and got a lot of clarification from this site: www.regular-expressions.info
This is the best I could come up with for yellowpages.com.au:
"/<a href=\"\/url\?q=http:\/\/www\.yellowpages\.com\.au\/.+?\">(.*?) - .{1,200}.{0,3}$num1.{0,3}$num2.{0,3}$num3<\/b>\./"
Still not great. Google results include the searched phone number with a bunch of different names with only very subtle differences between them. I have not been able to get a decent regex that can distinguish the good name from all the others.
Above commit is a nearly perfect combination of URL and REGEX for yellowpages. Still outstanding on this ticket are Superpages and googlemaps
Superpages.com.au fix was commited with #0e65acb and is ready for testing.
I can't get any useful reverse number results from maps.google.com.au. Unless someone can figure out how to structure a URL, this source looks like it should be deprecated.
Truelocal has possibility of returning a false CNAM. If Google is unable to get a single result it will substitute those that are close which could match the regex as it is currently defined. Need to check URL contents for:
No results found for <google search string>
and if present, abandon lookup.
Nothing back from OP on this issue, so am assuming no news is good news
All of theses lookup sources: maps.google.com.au superpages trulocal yellowpages
Use google to search with since none have a reverse number search. Google seems to still yield results, but none of them are returning CNAM. My quess is that the output format of google has changed and the regex needs attention.