RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.27k stars 1.03k forks source link

LeBonCoin failed with error 403 #1820

Open buloto24 opened 3 years ago

buloto24 commented 3 years ago

Error message: The requested resource cannot be found! Please make sure your input parameters are correct! cUrl error: (0) PHP error: Query string: action=display&bridge=LeBonCoin&keywords=nintendo+sitch&region=13&department=&cities=&category=&pricemin=&pricemax=&estate=&roomsmin=&roomsmax=&squaremin=&squaremax=&mileagemin=&mileagemax=&yearmin=&yearmax=&cubiccapacitymin=&cubiccapacitymax=&fuel=&owner=&format=Html Version: git.master.2714c3d

JackNUMBER commented 3 years ago

Quick investigation: region changed to locations. I tested to update the request (and a minimal request) but it seems the refusal come from somewhere else.

floviolleau commented 3 years ago

Hi,

403 since 2 days despite it worked great before

[Mon Mar 01 11:58:32.768278 2021] [proxy_fcgi:error] [pid 495:tid <redacted>] [client <redacted>] AH01071: Got error 'PHP message: Exception: Invalid parameters value(s): keywords_, estat_e, y_earmax in /<redacted>/rss-bridge/lib/error.php:24\nStack trace:\n#0 /<redacted>/rss-bridge/lib/error.php(33): returnError('Invalid paramet...', 400)\n#1 /<redacted>/rss-bridge/lib/BridgeAbstract.php(229): returnClientError('Invalid paramet...')\n#2 /<redacted>/rss-bridge/actions/DisplayAction.php(133): BridgeAbstract->setDatas(Array)\n#3 /<redacted>/rss-bridge/index.php(38): DisplayAction->execute()\n#4 {main}'

Thanks

timat35 commented 3 years ago

@floviolleau

I've got a 403 suddenly.. Changing the user agents (~ line 362 ) works for me.. after 10 call, not working anymore for me..

em92 commented 3 years ago

Ping @JackNUMBER as maintainer of this bridge.

JackNUMBER commented 3 years ago

@em92 Didn't manage to fix 403 at the moment. @timat35 can you provide a PR?

hista commented 3 years ago

Is there a fix for this great bridge?

As for me I also get a 403 error with a different message:

Error message: `Unexpected response from upstream.
cUrl error:  (0)
PHP error: Creating default object from empty value`
Query string: `action=display&bridge=LeBonCoin&keywords=thinkpad&region=2&department=&cities=&category=&pricemin=&pricemax=1400&estate=&roomsmin=&roomsmax=&squaremin=&squaremax=&mileagemin=&mileagemax=&yearmin=&yearmax=&cubiccapacitymin=&cubiccapacitymax=&fuel=&owner=&format=Atom`
Version: `dev.2020-11-10`
timat35 commented 3 years ago

@hista have you tried to change the user agent in the bridge it is currently working for me

hista commented 3 years ago

Hi @timat35, I thought it didn't fix for long, because of what you wrote earlier:

Changing the user agents (~ line 362 ) works for me.. after 10 call, not working anymore for me..

Does it mean it lasts longer than 10 calls now?

My current user-agent in the bridge is User-Agent: LBC;Android;10;SAMSUNG;phone;0aaaaaaaaaaaaaaa;wifi;8.24.3.8;152437;0 How should I change it?

timat35 commented 3 years ago

well, for me I just delete some aaaaa.. like LBC;Android;10;SAMSUNG;phone;0aaaaaa;wifi;8.24.3.8;152437;0

datadome is monitoring user agent, so we need to have different one (this is my theory, I'm not sure why it is working here though)

hista commented 3 years ago

Thanks @timat35 it works for now, I hope it will last for long, this LeBonCoin bridge is awesome when it works :-)

hista commented 3 years ago

Hi guys, is LeBonCoin bridge still perfectly working for you? As for me since yesterday, I sometimes get the same 403 error I mentioned earlier (https://github.com/RSS-Bridge/rss-bridge/issues/1820#issuecomment-819601471) and moreover my alerts are now very slow, a long time after the CACHE_TIMEOUT setting.

JackNUMBER commented 3 years ago

403 mainly come from LBC bot protection. My serveur's IP has been blocked and I take 403 since months. When I change the UserAgent I have 2-3 requests in 200 before come back to 403.

lapineige commented 3 years ago

Do we have a way to fix that ? Such as using Google Bot user agent ? Proxying through TOR ?

pointpaul commented 3 years ago

Hello, i got a way to fix 403. Not free though, you can find my contact infos on my profile

lapineige commented 3 years ago

Oh nice, we've got scammers now :thinking: (I was sure you would be a bot :smile:)

pointpaul commented 3 years ago

Scraping all Leboncoin daily but scammer yes, lol

JackNUMBER commented 3 years ago

I hope deploying docker image in the cloud will solve this for me. It's more complex than upload file on a server but it can be automated too.

timat35 commented 3 years ago

@pointpaul vade retro to https://www.growthhacking.fr/ ... The idea of this repo is to SHARE the code, not sell...

pointpaul commented 3 years ago

GL sharing datadome bypass for free then!

em92 commented 3 years ago

My serveur's IP has been blocked and I take 403 since months. When I change the UserAgent I have 2-3 requests in 200 before come back to 403.

@JackNUMBER, have you tried to make list of user agents and randomly use them on request? Here is some list of old user agents (created 6 years ago) https://gist.github.com/pzb/b4b6f57144aea7827ae4

JackNUMBER commented 3 years ago

@JackNUMBER, have you tried to make list of user agents and randomly use them on request? Here is some list of old user agents (created 6 years ago) https://gist.github.com/pzb/b4b6f57144aea7827ae4

@em92 Just tried a new time it and still have 403 each time. EDIT: same on an other server.

hista commented 3 years ago

@JackNUMBER Did you try to bypass the bot detection with https://github.com/MoterHaker/bypass-captcha-examples/blob/main/geo.captcha-delivery.com.js ?

timat35 commented 3 years ago

@hista I'm quite sure there is a way to bypass datadome without exploit poor people..

JackNUMBER commented 3 years ago

@JackNUMBER Did you try to bypass the bot detection with https://github.com/MoterHaker/bypass-captcha-examples/blob/main/geo.captcha-delivery.com.js ?

Thank you @hista RSS-Bridge is a PHP project.

Skealz commented 2 years ago

Hey, I am trying to understand how datadome block requests.

For now, I have the following. When browsing leboncoin using a browser several requests are made, among them, I've looked closer to requests made to :

The request to dd.leboncoin.fr looks like this :

POST /js/ HTTP/2
Host: dd.leboncoin.fr
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.leboncoin.fr/
Content-Type: application/x-www-form-urlencoded
Content-Length: 3972
Origin: https://www.leboncoin.fr
Dnt: 1
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
Sec-Gpc: 1

jsData=<bigpayload>&events=<some events made the user like mouse move, key up, each event has a timestamp...>&eventCounters=<count of the number of events by type of event>

So it looks like that this request sends to datadome informations about how the user interact with the website (its mouse movements, etc). The data response to that request is :

{"status":200,"cookie":"datadome=mWVbdmoClFIyL2o2GK-ezox3P47-smtjN19A4ricR5tuHe~PhrnNjgilN_4y2dqd1bB-TYCkvoSyaH3U4ksZ8s_.uPSdVKyzef2xhjzbqNvVMR7bd6OnJgNqp_ZDGBx; Max-Age=31536000; Domain=.leboncoin.fr; Path=/; Secure; SameSite=Lax"}

We see that datadome gave us a cookie. I wonder if it uses that cookie to follow us on the website and to keep track of real users and block the others.

If I remove the payload (jsData & co), the reponse is status 400, without cookie.

The other request is the one that queries the API :

POST /finder/search HTTP/2
Host: api.leboncoin.fr
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.leboncoin.fr/
Api_key: ba0c2dad52b3ec
Content-Type: application/json
Origin: https://www.leboncoin.fr
Content-Length: 198
Dnt: 1
Cookie: utag_main=_st:1648151365624$v_id:017fbd5e72ba005272a3c812e4e400044001900900bd0$_sn:1$_ss:1$_pn:1%3Bexp-session$ses_id:1648149557946%3Bexp-session; __Secure-InstanceId=e6f84352-df87-4cc3-8cd5-0f93410a3373;include_in_experiment=false
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-site
Sec-Gpc: 1

{"owner_type":"all","limit":35,"limit_alu":3,"sort_by":"relevance","sort_order":"desc","filters":{"enums":{"ad_type":["offer"]},"keywords":{"text":"ya"}},"listing_source":"direct-search","offset":0}

There you have the research filters as payload. From here, I'm not sure of what's happening. I'm not sure how the API decide if I'm blocked or not, and I can't really experiment because the request always works for now. My guess is that, maybe setting the Cookie : datadome=blabla help to not being blocked. Or maybe, you have to send good enough user interaction through the request to dd.leboncoin.fr to make your IP "valid' for a certain amount of time.

As datadome seems to be based on IA, the behavior may evolve and there are potentially multiple way to bypass it.

If someone wants to give it a try, I've made a little script in python to try these requests : https://github.com/Skealz/reqlbc/blob/main/req_lbc.py It would be great if blocked people try it and tell if it works

Skealz commented 2 years ago

I changed the endpoint in my PHP project to point to https://api.leboncoin.fr/finder/search instead of https://api.leboncoin.fr/api/adfinder/v1/search. I still get 403.

I checked the data given by these 403 responses, they contain : {"url":"https://geo.captcha-delivery.com/captcha/?initialCid=AHrlqAAAAAMAfeednTqgQfoAWKt4JA==&cid=Zm8ddoCRZ_odYhn8CsQpzejqgYQAgtoZJtN4rMvsBVABBuJmiXJG~hrqH~BZiiV1kQ1ZIpB7fUwia6fSwREUG3KY0677oKtMTV~nmd-MOfwHEKhbc~U9HWMbXUUIzW5&referer=https%3A%2F%2Fapi.leboncoin.fr%2Ffinder%2Fsearch&hash=05B30BD9055986BD2EE8F5A199D973&t=fe&s=7501"} Meaning that this is bot blocking mechanism.

This is surprising to me because from the same IP, I am able to issue request (from python) to the same API endpoint, without being blocked. Maybe datadome is able to identify something very specific about the request (timing, formatting... i don't know) to block it.

I will try to integrate into the PHP code, a request to dd.leboncoin.fr before the request to the API, to see if something happens.

timat35 commented 2 years ago

@Skealz Thanks!

I've got error 403 though (with the python code) after 2-3 days.. But that's a good beginning, I'll try with random payload to datadome.. Also, I'll try to make it works with PHP when I've got time (who knows when), thanks again

hista commented 2 years ago

Hi timat35 Could you find some time since your last message? :-)

mariolog commented 1 year ago

It works when using a VPN. I chose the Netherlands on Browsec VPN for Firefox. No problems with the app on my phone.