Alankar0416 / Justdial-Scrapper

A 100% working Justdial scrapper, Just enter the url and it'll extract business info from it
13 stars 22 forks source link

have scrapped phone number?? from just dial #1

Open dvijparekh opened 6 years ago

Alankar0416 commented 6 years ago

I was able to earlier, but it seems they have started sending svg image instead of numbers.

Alankar0416 commented 6 years ago

@dvijparekh1995 However we can take the class name and map it from there. But this will break when they change it again.

vishnu1991 commented 6 years ago

There is a series used by JD to show phone number. If we can extract the span>classname then we can get mobile numbers easily.

Series is as below Number - span class="icon-XX" 1 - icon-yz 2 -icon-wx 3 -icon-vu 4 -icon-ts 5 -icon-rq 6 -icon-po 7 -icon-nm 8 -icon-lk 9 -icon-ji 0 -icon-acb

Alankar0416 commented 6 years ago

Yes, I had that in mind. But the issue is they can change the class name whenever they want and this will break then. Better to think of something concrete. The most foolproof solution is to use digit recognition on the image.

vishnu1991 commented 6 years ago

yes i think the same. as the will surely change it.

krishnamalireddy commented 6 years ago

I'm not getting the phonenumbers. Can you tell me how to get phone numbers

Alankar0416 commented 6 years ago

@krishnamalireddy JD is now using svg's in place of actual numbers. That's why parsing is getting failed. There are couple of ways to get around this.

Each svg's has a unique code which can be mapped - will fail if they change mapping again Use a digit recognition over the svg.

Unfortunately I am not getting time to develop this. Will pick it up whenever I have some bandwidth.

hrwndr commented 5 years ago

@Alankar0416 Could you please demonstrate, how can we implement the numbers from svgs in code?

AdityaMalireddy commented 5 years ago

@Alankar0416 Could you please demonstrate, how can we implement the numbers from svgs in code?

simple solution is instead of using .string use .find_all for phone number.

You will get random code of svg's convert them

Alankar0416 commented 5 years ago

The issue is we can to keep a map of svg code and number but it JD can change it anytime.

AdityaMalireddy commented 5 years ago

Ha they can change it any time. If they have changed we have to decode it again. By the way they haven't changed it for a long time

ketanshah79 commented 5 years ago

Thanks @Alankar0416 for sharing the code.

Here is an array mapping I've used as a second pass on the csv file. I used the .find_all for phone number.

Attached is my php code. clean_csv.php.txt

Alankar0416 commented 5 years ago

Great work @ketanshah79 Haven't tried this code. Are you able to successfully map phone numbers with this additional script? If yes, I can add this into the original script to make things easy for everyone.

ketanshah79 commented 5 years ago

Yes I did get it to work. Try running it on any CSV generated by your python script

Thank you

On Fri, Feb 8, 2019, 4:01 PM Alankar Gupta <notifications@github.com wrote:

Great work @ketanshah79 https://github.com/ketanshah79 Haven't tried this code. Are you able to successfully map phone numbers with this additional script? If yes, I can add this into the original script to make things easy for everyone.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Alankar0416/Justdial-Scrapper/issues/1#issuecomment-461759418, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHRkkpVyeYgaEtICQ55nPEWZ6e8h3fSks5vLVIMgaJpZM4Si73y .

Dhiren-Biren commented 5 years ago

only 10 data retrieving

mps1305 commented 5 years ago

@Alankar0416 could you please post the code along with @ketanshah79 's changes? Need to get justdial data for a college project. Please guys, if either of you could do it, it'll be really helpful

Thanks!

dvijparekh commented 5 years ago

@Alankar0416 could you please post the code along with @ketanshah79 's changes? Need to get justdial data for a college project. Please guys, if either of you could do it, it'll be really helpful

Thanks!

@mps1305 check my forked repo i have made changes accordingly and its working just change url whichever you want

mps1305 commented 5 years ago

hey @dvijparekh , it was working up until sometime back. then started getting this error. Any help in this regard would be highly appreciated! "[WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond"

dvijparekh commented 5 years ago

it seems like justdial is blocking scraper to scrape working on it

SuhailSaify commented 5 years ago

Hey, I have written a script that will scrape phone numbers from any JustDail Business page. It uses the info in CSS stylesheet to create a mapping between the strings assigned to each number. The mapping is done every time you load a page, therefore it works for every business.

Please try this: https://github.com/SuhailSaify/Justdial-Scrapper

PS: it also scrapes other info along with Phone numbers. (Working on July, 2019)

krishnamalireddy commented 4 years ago

I am getting urllib open timeout error. Is this code still working for anyone?

abhi-ux commented 4 years ago

can anyone update latest code here?

abhi-ux commented 4 years ago

The phone number is not correct

On Thu, 6 Feb 2020 at 8:50 PM, Suhail Saifi notifications@github.com wrote:

can anyone update latest code here?

try this: https://github.com/SuhailSaify/Justdial-Scrapper

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Alankar0416/Justdial-Scrapper/issues/1?email_source=notifications&email_token=ANUP7F2ZKST7WC5PIBEKJEDRBQTB7A5CNFSM4EULXXZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7S45I#issuecomment-582954613, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANUP7FZWFKCBGAZTVAJSZYTRBQTB7ANCNFSM4EULXXZA .

builditpossible-gs commented 4 years ago

I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using

dvijparekh commented 4 years ago

I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using

please share link url of just dial you are trying to scrape

builditpossible-gs commented 4 years ago

I am about to solve this issue, can anyone help me with this error - https://stackoverflow.com/questions/60875316/typeerror-string-indices-must-be-integers-when-getting-class-fro-span-tag-using

please share link url of just dial you are trying to scrape

Solved it brother. Thank you.

builditpossible-gs commented 4 years ago

There is another error though AttributeError: 'NoneType' object has no attribute 'text' on line return body.find('span', {'class':'mrehover'}).text.strip() in get_address

dvijparekh commented 4 years ago

There is another error though AttributeError: 'NoneType' object has no attribute 'text' on line return body.find('span', {'class':'mrehover'}).text.strip() in get_address

it means it is not able to find span tag having class mrehover so body.find is returning none which doesnt have any method or attribute text() try below code and let me know what are you getting from it

tesVar = body.find('span', {'class':'mrehover'}) print(`tesVar)

alokm014 commented 3 years ago

Hey, use this method https://youtu.be/EkbF5JwuHqU