RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.21k stars 1.03k forks source link

LeBonCoin blocks us #789

Closed JackNUMBER closed 6 years ago

JackNUMBER commented 6 years ago

Leboncoin is now blocking IP that request repetitively their pages (1 request every 12 hours in my case). 😔 They use Datadome's services

teromene commented 6 years ago

Datadome's "protection" is actually pretty trivial to breach, as it just checks (at least it works with curl's CLI):

logmanoriginal commented 6 years ago

For testing purposes this can be added via the header parameter from the bridge, similar to how FacebookBridge was implemented: https://github.com/RSS-Bridge/rss-bridge/blob/d07deb093044614d68361f018054f36ec7839b6e/bridges/FacebookBridge.php#L87-L90

The user agent is already specified on each request: https://github.com/RSS-Bridge/rss-bridge/blob/d07deb093044614d68361f018054f36ec7839b6e/lib/contents.php#L10

Maybe something like this works? (Haven't tested it)

$header = array(
    'Accept: text/html',
    'Accept-Language: ' . getEnv('HTTP_ACCEPT_LANGUAGE'),
    'Accept-Encoding: identity'
);
Draky50110 commented 6 years ago

Same problem on checky, a standalone PHP script to create RSS feeds and mail alerts on LeBonCoin : https://forum.cheky.net/erreur-403-t605-p1.html

teromene commented 6 years ago

Confirmed that modifying the headers works.

teromene commented 6 years ago

Should be fixed in 9fc1e97

teromene commented 6 years ago

Follow up: It seems that IPs are still banned after a short amount of time. However, I have a solution ! I have tested it, and it even works with a brand new tor IP doing 300 requests per second for 20 minutes, so it should be OK.

There are two main method of proceeding, the first one would slow down the bridge (One more request necessary) and might not be enough, and the second one would require a major rewrite.

The first one

This method consists of fetching a valid datadome cookie before firing our actual request. This can be done by accessing their API, using this request: curl https://api.leboncoin.fr This will not output anything usefull, but a valid datadome cookie will be issued. Please note, however that based on some IP information, you might still be blocked, which leads me to the second solution, that almost always work.

The second one, using the API

As we have seen, leboncoin has an (unofficial at least) API. However, shall you try to request it you'll obtain a 401 Unauthorized. The API indeed requires a key. Using specific voodoo rituals, we can find the necessary headers (api_key: ba0c2dad52b3ec). This value is extremely unlikely to be changed. Thanks to the previously mentioned voodoo ritual, we can also get the entry point for searches, with is at https://api.leboncoin.fr/finder/search. (Please be aware that ALL the API queries need to use the HTTP POST method, if not you will get a 404 message back). The data you need to post is a json object. For search, here are the possible parameters:

Parameter name Value Explanation
limit int, seems to have a maximal value (~50 ?) number of items in the output
limit_alu int No idea, my tarot cards are mute on this one
owner_type private or pro Whether the person selling is a private seller or a professional
pivot string ? Probably how to sort the search
sort_by price, distance, time How to order the results
sort_order desc, asc In what way do we order the results
filters array containing filter parameters, see other table Search filters

Filters:

Parameter name Value Explanation
add_type JSON array that contains one or more values of type "offer" or "demand" The type of add
location {"departments" : ["department_id1"...], "region" : ["regionid1", "regionid2"...], "city_zipcodes": [{"zipcode": zipcode_1"}, ...]}. Only one of departments, region, and city_zipcode is necessary, or it can stay empty The location of the offer
keywords {"text": "keyword"} or {"text": "keyword", "type": "subject"} in order to search in the title only The search keyword
ranges Unknown Probably some sort of range for prices and other
category {"id" : "cat_id"} Category in which to search

In order to give you a better idea, this is what a request looks like : {"limit":35,"limit_alu":3,"filters":{"category":{"id":"33"},"enums":{"ad_type":["demand"]},"location":{"regions":["5"],"departments":["21"]},"keywords":{"text":"Cat"},"ranges":{}}}

JackNUMBER commented 6 years ago

@teromene Really nice! Where did you find the API? I'm planning to add some fields and need to know if they are available.

783 is now outdated and will be updated with API method.

Filters: price, departments, real_estate_type, square, rooms, mileage, regdate, brand, model, cubic_capacity

teromene commented 6 years ago

Every option that is a range goes in the range field of the filters object, like this for the price for example :

{"limit":35,"limit_alu":3,"filters":{"category":{"id":"9"},"enums":{"ad_type":["offer"]},"location":{"regions":["23"]},"keywords":{},"ranges":{"price":{"min":100000,"max":125000}}}}

This is applicable to square, rooms, mileage, regdate....

All the options that take a simple value are going into the enums field of the filters object, like for the brand for example:

{"limit":35,"limit_alu":3,"filters":{"category":{"id":"2"},"enums":{"brand":["Bmw"], "ad_type":["offer"]},"location":{"regions":["23"]},"keywords":{},"ranges":{}}

This is applicable to real_estate_type, model....

DjTrilogic commented 6 years ago

I've wrapped all the api calls in c#. Contact me if you are interested !

monsieurnebo commented 5 years ago

@teromene Are you aware of any change LBC side? I tried some requests, but I'm stuck with a 403 Forbidden response.

I just checked if the API key was still the right one, and it is.

EDIT: It's working fine now. Probably headers-related.

Meabo commented 4 years ago

Hello @teromene your solution is still working, I'm getting a 403. Did you put the api_key as a query parameter or a Header ? Thanks

Meabo commented 4 years ago

@monsieurnebo It's still working ? Can you show me the header parameters that you used ? Thanks 👍

Meabo commented 4 years ago

@DjTrilogic Sent you an email :)

hista commented 4 years ago

Same request here :)

teromene commented 4 years ago

Yes, I believe that it is still working. I however had to change the bridge to submit a fake user agent, if not I indeed have a 403

hista commented 4 years ago

Which fake user agent do you advice?

waterdrop01 commented 4 years ago

Hello, this morning I get the error "L'adresse indiquée a généré une erreur 403.": Screenshot from 2020-01-05 11-46-01

Any ideas how to solve this issue? Thanks!

waterdrop01 commented 4 years ago

whoops, sorry I mistakenly thought this was Cheky's github page

ImenAyari commented 4 years ago

Hi, Does it still work? I'm trying to get Data from leBonCoin but I'm getting 403 response. I work with Python. Any suggestions?

JackNUMBER commented 4 years ago

@ImenAyari Hi, yes still work. I tested with the last state of master 366d2d66b3fa126cfad7f2ac104e722d5f69d9ed