SimonGulix / colruyt_scraper

Not really a scraper, but uses the hidden API of www.colruyt.be in order to scrape their products.
MIT License
10 stars 1 forks source link

Origin of API #3

Open densurfer opened 2 years ago

densurfer commented 2 years ago

Hey,

Thanks for the project! I wanted to access the API of Colruyt, but I couldn't find anything. But helps to you, I got something! :D But I was wondering where did you get the URL from or this information? I want to build a broader API based on this, but I want to know how to use the query params, or which other URLs there are to use. Can you help me out?

mims92 commented 5 months ago

Hey, were you able to build something out of that API? I want to search based on the EAN.

BelgianNoise commented 2 months ago

With the discontinuation of the Colruyt mobile app in favor of the Xtra app they have taken down the old hidden API endpoints. (both the products and promotions endpoints).

The current endpoints (also used on colruyt.be, go check in the network tab of devtools) only require a simple auth token that simply never expires. (X-CG-APIKey header) This can be hardcoded into your code or be retrieved using a (headless) browser.

Hey, were you able to build something out of that API? I wan't to search based on the EAN.

Are you still looking for this ? I might be able to provide you some data. I don't see EAN in any response body, could it be under a different name ?

mims92 commented 2 months ago

Hi, Thanks for the update. I will have a look and see if I can find something relevant for my use case: scanning of the bar code.

BelgianNoise commented 2 months ago

Hi, Thanks for the update. I will have a look and see if I can find something relevant for my use case: scanning of the bar code.

I checked using my very tasty EVERYDAY cola zero 50cl :), it is listed under GTIN in the response of:

Request ``` fetch("https://apip.colruyt.be/gateway/ictmgmt.emarkecom.cgproductsearchsvc.v2/v1/nl/products?clientCode=CLP&isAvailable=true&page=1&placeId=684&searchTerm=cola%20zero&size=1&ts=1713775266421&userMostBought=true", { "headers": { "accept": "application/json, text/plain, */*", "accept-language": "en", "authorization": "Bearer ...", "cache-control": "no-cache", "pragma": "no-cache", "sec-ch-ua": "\"Not A(Brand\";v=\"99\", \"Opera GX\";v=\"107\", \"Chromium\";v=\"121\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"macOS\"", "sec-fetch-dest": "empty", "sec-fetch-mode": "cors", "sec-fetch-site": "same-site", "x-cg-apikey": "a8ylmv13-b285-4788-9e14-0f79b7ed2411", "cookie": "...", "Referer": "https://www.colruyt.be/", "Referrer-Policy": "strict-origin-when-cross-origin" }, "body": null, "method": "GET" }); ```
Response body (response.products) ``` [ { "productId": "119979", "technicalArticleNumber": "3078987", "commercialArticleNumber": "24361", "name": "cola zero", "brand": "EVERYDAY", "seoBrand": "Everyday", "content": "50cl", "thumbNail": "https://static.colruytgroup.com/images/200x200/std.lang.all/88/18/asset-1508818.jpg", "fullImage": "https://static.colruytgroup.com/images/500x500/std.lang.all/88/18/asset-1508818.jpg", "price": { "basicPrice": 0.247, "recommendedQuantity": "9.0", "measurementUnitPrice": 0.496, "measurementUnit": "L", "isRedPrice": false, "pricePerUOM": 0.496, "activationDate": "03-04-2024", "recordSource": "Offline", "isPromoActive": "N", "priceChangeCode": "L" }, "isAvailable": true, "isPriceAvailable": true, "inPromo": false, "topCategoryName": "Dranken", "topCategoryId": "354", "walkRouteSequenceNumber": 10096, "businessDomain": "RETAIL_BE", "IsPrivateLabel": true, "IsBiffe": false, "WeightconversionFactor": "0", "IsWeightArticle": false, "nutriscoreLabel": "C", "IsBio": false, "CountryOfOrigin": "BELGIË", "IsExclusivelySoldInLuxembourg": false, "OrderUnit": "P", "ShortName": "EVD COLA ZERO 50CL", "InSeason": true, "IsNew": false, "GTIN": [ "25400141272424", "15400141272427", "05400141380019", "05400141080117", "05400141086980", "05400141272420", "05400141121735", "05400141101812", "05400141108644" ], "RecentQuanityOfStockUnits": "9,1", "LongName": "EVERYDAY cola zero 50cl", "AlcoholVolume": "0", "StartSeasonDate": "30/04/2019", "FicCode": "I" } ] ```

This is one of the few fields I don't track in my scraper. For now the only way to filter by EAN I see is to scrape everything and query your own dataset. If you find a way to directly query by EAN/GTIN from their API, I am also very interested as well.

mims92 commented 2 months ago

Hey, thank you!

I am still unable to fetch item per EAN. It seems only searchTerm is available. They also put a strict "bot" and request threshold in place.

BelgianNoise commented 2 months ago

They also put a strict "bot" and request threshold in place.

annoying indeed, that's why this project uses ssl proxies. I know the page you are seeing, this can be bypassed by providing a valid X-CG-APIKey (and by leaving some time between requests if you don't wanna use proxies).