Closed JackNUMBER closed 6 years ago
Datadome's "protection" is actually pretty trivial to breach, as it just checks (at least it works with curl's CLI):
Accept
HeaderAccept-Language
headerAccept-Encoding
header that can be set to identity
if we don't want to bother with decompressionFor testing purposes this can be added via the header
parameter from the bridge, similar to how FacebookBridge was implemented: https://github.com/RSS-Bridge/rss-bridge/blob/d07deb093044614d68361f018054f36ec7839b6e/bridges/FacebookBridge.php#L87-L90
The user agent is already specified on each request: https://github.com/RSS-Bridge/rss-bridge/blob/d07deb093044614d68361f018054f36ec7839b6e/lib/contents.php#L10
Maybe something like this works? (Haven't tested it)
$header = array(
'Accept: text/html',
'Accept-Language: ' . getEnv('HTTP_ACCEPT_LANGUAGE'),
'Accept-Encoding: identity'
);
Same problem on checky, a standalone PHP script to create RSS feeds and mail alerts on LeBonCoin : https://forum.cheky.net/erreur-403-t605-p1.html
Confirmed that modifying the headers works.
Should be fixed in 9fc1e97
Follow up: It seems that IPs are still banned after a short amount of time. However, I have a solution ! I have tested it, and it even works with a brand new tor IP doing 300 requests per second for 20 minutes, so it should be OK.
There are two main method of proceeding, the first one would slow down the bridge (One more request necessary) and might not be enough, and the second one would require a major rewrite.
This method consists of fetching a valid datadome cookie before firing our actual request. This can be done by accessing their API, using this request:
curl https://api.leboncoin.fr
This will not output anything usefull, but a valid datadome cookie will be issued.
Please note, however that based on some IP information, you might still be blocked, which leads me to the second solution, that almost always work.
As we have seen, leboncoin has an (unofficial at least) API. However, shall you try to request it you'll obtain a 401 Unauthorized
. The API indeed requires a key. Using specific voodoo rituals, we can find the necessary headers (api_key: ba0c2dad52b3ec
). This value is extremely unlikely to be changed.
Thanks to the previously mentioned voodoo ritual, we can also get the entry point for searches, with is at https://api.leboncoin.fr/finder/search
. (Please be aware that ALL the API queries need to use the HTTP POST method, if not you will get a 404 message back).
The data you need to post is a json object. For search, here are the possible parameters:
Parameter name | Value | Explanation |
---|---|---|
limit | int, seems to have a maximal value (~50 ?) | number of items in the output |
limit_alu | int | No idea, my tarot cards are mute on this one |
owner_type | private or pro | Whether the person selling is a private seller or a professional |
pivot | string | ? Probably how to sort the search |
sort_by | price, distance, time | How to order the results |
sort_order | desc, asc | In what way do we order the results |
filters | array containing filter parameters, see other table | Search filters |
Filters:
Parameter name | Value | Explanation |
---|---|---|
add_type | JSON array that contains one or more values of type "offer" or "demand" | The type of add |
location | {"departments" : ["department_id1"...], "region" : ["regionid1", "regionid2"...], "city_zipcodes": [{"zipcode": zipcode_1"}, ...]}. Only one of departments, region, and city_zipcode is necessary, or it can stay empty | The location of the offer |
keywords | {"text": "keyword"} or {"text": "keyword", "type": "subject"} in order to search in the title only | The search keyword |
ranges | Unknown | Probably some sort of range for prices and other |
category | {"id" : "cat_id"} | Category in which to search |
In order to give you a better idea, this is what a request looks like :
{"limit":35,"limit_alu":3,"filters":{"category":{"id":"33"},"enums":{"ad_type":["demand"]},"location":{"regions":["5"],"departments":["21"]},"keywords":{"text":"Cat"},"ranges":{}}}
@teromene Really nice! Where did you find the API? I'm planning to add some fields and need to know if they are available.
Filters: price
, departments
, real_estate_type
, square
, rooms
, mileage
, regdate
, brand
, model
, cubic_capacity
Every option that is a range goes in the range
field of the filters
object, like this for the price for example :
{"limit":35,"limit_alu":3,"filters":{"category":{"id":"9"},"enums":{"ad_type":["offer"]},"location":{"regions":["23"]},"keywords":{},"ranges":{"price":{"min":100000,"max":125000}}}}
This is applicable to square
, rooms
, mileage
, regdate
....
All the options that take a simple value are going into the enums
field of the filters
object, like for the brand for example:
{"limit":35,"limit_alu":3,"filters":{"category":{"id":"2"},"enums":{"brand":["Bmw"], "ad_type":["offer"]},"location":{"regions":["23"]},"keywords":{},"ranges":{}}
This is applicable to real_estate_type
, model
....
I've wrapped all the api calls in c#. Contact me if you are interested !
@teromene Are you aware of any change LBC side? I tried some requests, but I'm stuck with a 403 Forbidden
response.
I just checked if the API key was still the right one, and it is.
EDIT: It's working fine now. Probably headers-related.
Hello @teromene your solution is still working, I'm getting a 403. Did you put the api_key as a query parameter or a Header ? Thanks
@monsieurnebo It's still working ? Can you show me the header parameters that you used ? Thanks 👍
@DjTrilogic Sent you an email :)
Same request here :)
Yes, I believe that it is still working. I however had to change the bridge to submit a fake user agent, if not I indeed have a 403
Which fake user agent do you advice?
Hello, this morning I get the error "L'adresse indiquée a généré une erreur 403.":
Any ideas how to solve this issue? Thanks!
whoops, sorry I mistakenly thought this was Cheky's github page
Hi, Does it still work? I'm trying to get Data from leBonCoin but I'm getting 403 response. I work with Python. Any suggestions?
@ImenAyari Hi, yes still work. I tested with the last state of master 366d2d66b3fa126cfad7f2ac104e722d5f69d9ed
Leboncoin is now blocking IP that request repetitively their pages (1 request every 12 hours in my case). 😔 They use Datadome's services