Giglium / vinted_scraper

A very simple Python package that scrapes the Vinted website to retrieve information about its items.
MIT License
15 stars 3 forks source link

Error 403 #54

Open Fedee0 opened 3 weeks ago

Fedee0 commented 3 weeks ago

Describe the bug

Hi, I'm using your API to fetch from vinted articles, but when I try to use my code on a VPS I get this error: RuntimeError: Cannot fetch session cookie from https://www.vinted.fr, because of status code: 403 different from 200. I'm using residential proxies, what should I change to not get errors?

Example.

I removed all information regarding proxies

username = ''
password = ''
proxy = f"”
vinted_proxies = {
         “http": proxy,
          “https": proxy
        }
item_id = match.group(1)

item_info = VintedScraper(“https://www.vinted.fr/”, proxies=vinted_proxies).item(item_id)
Giglium commented 3 weeks ago

Hi, Vinted may be blocking traffic from certain VPNs, which could be causing the issue.

To verify if this is the case, you can run the following curl command:

curl -v -c - -L "https://www.vinted.fr/"
Fedee0 commented 3 weeks ago

At the moment, i created my own function to get a session cookie using proxies, this is the output of the curl command:

* Host www.vinted.fr:443 was resolved.
* IPv6: 2606:4700::6812:79e8, 2606:4700::6812:78e8
* IPv4: 104.18.120.232, 104.18.121.232
*   Trying 104.18.120.232:443...
* Connected to www.vinted.fr (104.18.120.232) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=vinted.fr
*  start date: Aug 11 05:04:31 2024 GMT
*  expire date: Nov  9 05:04:30 2024 GMT
*  subjectAltName: host "www.vinted.fr" matched cert's "*.vinted.fr"
*  issuer: C=US; O=Google Trust Services; CN=WE1
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA256
*   Certificate level 1: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 2: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using ecdsa-with-SHA384
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://www.vinted.fr/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: www.vinted.fr]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.5.0]
* [HTTP/2] [1] [accept: */*]
> GET / HTTP/2
> Host: www.vinted.fr
> User-Agent: curl/8.5.0
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
< HTTP/2 403 
< date: Fri, 23 Aug 2024 12:46:51 GMT
< content-type: text/plain; charset=UTF-8
< content-length: 16
< x-frame-options: SAMEORIGIN
< referrer-policy: same-origin
< cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< expires: Thu, 01 Jan 1970 00:00:01 GMT
* Added cookie __cf_bm="LOaSsyPowxOHlPi0WjLJbMGJLtV6yzBakAfvozJULy0-1724417211-1.0.1.1-Fiyk6B7Wfvx3iZvavGuJFGz6E6zpBthgBH0e.O_R2Jy.7ESoa71PuQLPyMX7ctMQUduR_qD7VAwoUbQce7NJYa.5zbUYwTOzovti_OYIZq4" for domain vinted.fr, path /, expire 1724419011
< set-cookie: __cf_bm=LOaSsyPowxOHlPi0WjLJbMGJLtV6yzBakAfvozJULy0-1724417211-1.0.1.1-Fiyk6B7Wfvx3iZvavGuJFGz6E6zpBthgBH0e.O_R2Jy.7ESoa71PuQLPyMX7ctMQUduR_qD7VAwoUbQce7NJYa.5zbUYwTOzovti_OYIZq4; path=/; expires=Fri, 23-Aug-24 13:16:51 GMT; domain=.vinted.fr; HttpOnly; Secure; SameSite=None
< server: cloudflare
< cf-ray: 8b7b3535adc53608-FRA
< 
* Connection #0 to host www.vinted.fr left intact
error code: 1005# Netscape HTTP Cookie File
# https://curl.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.

#HttpOnly_.vinted.fr    TRUE    /       TRUE    1724419011      __cf_bm LOaSsyPowxOHlPi0WjLJbMGJLtV6yzBakAfvozJULy0-1724417211-1.0.1.1-Fiyk6B7Wfvx3iZvavGuJFGz6E6zpBthgBH0e.O_R2Jy.7ESoa71PuQLPyMX7ctMQUduR_qD7VAwoUbQce7NJYa.5zbUYwTOzovti_OYIZq4
Giglium commented 2 weeks ago

As you can see from the curl command you are receiving a 403 status code: < HTTP/2 403, and also in the cookie section you don't see the Added cookie _vinted_fr_session= that we need to retrieve the cookie.

Is it possible that you workaround by getting the cookie outside the proxy and after that, you use it with the proxy?

It can be a cool workaround!