DicksonC96 / PropertyGuru-Scraper

A python scraper to scrape information on property sale or rent in Malaysia from PropertyGuru.com. Strictly for educational purposes only.
MIT License
23 stars 6 forks source link

PropertyGuru scraping for another country #4

Open tljoven opened 1 year ago

tljoven commented 1 year ago

hey @DicksonC96 super nice work! Kinda new to this and trying to scrape data for singapore and would like to get some guidance. i assume the code structure would work exactly the same on the singapore site?

DicksonC96 commented 1 year ago

Thanks @tljoven. I suppose if they are using the same html structure for the SG's. Though you might need to change the API/URL parameters for your region. Hope it helps!

benjamin-mak commented 1 year ago

Hi @DicksonC96, how did you manage to bypass the Cloudflare captcha? I tried using your functions but it doesn't work. @tljoven did you manage to successfully scrape data from the Singapore website?

tljoven commented 1 year ago

@benjamin-mak yep

benjamin-mak commented 1 year ago

@tljoven what did you do to bypass cloudflare?

tljoven commented 1 year ago

I didnt have to O.o @benjamin-mak

DicksonC96 commented 1 year ago

Hi @DicksonC96, how did you manage to bypass the Cloudflare captcha? I tried using your functions but it doesn't work. @tljoven did you manage to successfully scrape data from the Singapore website?

I can't as well. Cloudscrapper couldn't bypass Level 2 Cloudflare captcha. Suspected that they upgraded their plan since 1st January 2023.

@tljoven mind to share which module u using? Still cloudscrapper?

tljoven commented 1 year ago

wait why is there captcha? you log in to scrape?

Arvedek commented 1 year ago

Does it mean sg data is not available?

Arvedek commented 1 year ago

Hi May I check if propertyguru.sg can still be scraped?

benjamin-mak commented 1 year ago

@Arvedek I could not get cloudscrapper to work, so I used a third-party web scraper to assist me. However, it also does not fully work, I managed to get about 70-80% of the data only

benjamin-mak commented 1 year ago

If you want, I can share with you the code and you can try it for yourself.

Arvedek commented 1 year ago

If you want, I can share with you the code and you can try it for yourself.

That would be nice! Thanks

benjamin-mak commented 1 year ago

Go to my profile and follow me on Linkedin, I'll share with you there

Arvedek commented 1 year ago

Done> Go to my profile and follow me on Linkedin, I'll share with you there

SungLuRent commented 9 months ago

I'm also scraping for SG. The cloudscraper did work, but I got 403 frequetly after sending certain amount of requests. I wonder that is this the only problem existing in SG PropertyGuru? If it's not the only issue in SG version, may I know how did you manage to deal with it? edit: I figured out that the 403 reponses mean the cloudscraper didn't work. The 403 page would lately execute self invoking JavaScript and bring user to the anti-bot page. It seems there are almost no good off-line(not using any scraping online API) approaches to bypass cloudflare anti-bot now(I tried undetected_chromdriver). I have an alternaltive, is to pass anti-bot by human then copy the cloudflare token. Using the same IP the same User-Agent, it should provide about 30 ~ 60 mins to scrape semi-automatically.