Open tljoven opened 1 year ago
Thanks @tljoven. I suppose if they are using the same html structure for the SG's. Though you might need to change the API/URL parameters for your region. Hope it helps!
Hi @DicksonC96, how did you manage to bypass the Cloudflare captcha? I tried using your functions but it doesn't work. @tljoven did you manage to successfully scrape data from the Singapore website?
@benjamin-mak yep
@tljoven what did you do to bypass cloudflare?
I didnt have to O.o @benjamin-mak
Hi @DicksonC96, how did you manage to bypass the Cloudflare captcha? I tried using your functions but it doesn't work. @tljoven did you manage to successfully scrape data from the Singapore website?
I can't as well. Cloudscrapper couldn't bypass Level 2 Cloudflare captcha. Suspected that they upgraded their plan since 1st January 2023.
@tljoven mind to share which module u using? Still cloudscrapper?
wait why is there captcha? you log in to scrape?
Does it mean sg data is not available?
Hi May I check if propertyguru.sg can still be scraped?
@Arvedek I could not get cloudscrapper to work, so I used a third-party web scraper to assist me. However, it also does not fully work, I managed to get about 70-80% of the data only
If you want, I can share with you the code and you can try it for yourself.
If you want, I can share with you the code and you can try it for yourself.
That would be nice! Thanks
Go to my profile and follow me on Linkedin, I'll share with you there
Done> Go to my profile and follow me on Linkedin, I'll share with you there
I'm also scraping for SG. The cloudscraper did work, but I got 403 frequetly after sending certain amount of requests. I wonder that is this the only problem existing in SG PropertyGuru? If it's not the only issue in SG version, may I know how did you manage to deal with it? edit: I figured out that the 403 reponses mean the cloudscraper didn't work. The 403 page would lately execute self invoking JavaScript and bring user to the anti-bot page. It seems there are almost no good off-line(not using any scraping online API) approaches to bypass cloudflare anti-bot now(I tried undetected_chromdriver). I have an alternaltive, is to pass anti-bot by human then copy the cloudflare token. Using the same IP the same User-Agent, it should provide about 30 ~ 60 mins to scrape semi-automatically.
hey @DicksonC96 super nice work! Kinda new to this and trying to scrape data for singapore and would like to get some guidance. i assume the code structure would work exactly the same on the singapore site?