Gertje823 / Vinted-Scraper

This is a tool to scrape/download images and data from Vinted & Depop using the API and stores the data in a SQLite database.
GNU General Public License v3.0
91 stars 21 forks source link

Connection Timeout Error - Feature Suggestion #16

Closed harveycastree closed 2 years ago

harveycastree commented 2 years ago

Hi, first of all, great program! However I seems to be getting a connection timeout when pulling data from Depop

C:\Users\harve\Downloads\Vinted-Scraper-main\Vinted-Scraper-main>python scraper.py -d Creation of the directory failed or the folder already exists coose https://webapi.depop.com/api/v1/shop/coose/ https://media-photos.depop.com/b0/18244865/871121032_eb54cf4473da478fab8e5572df9f1be3/U0.jpg Creation of the directory failed or the folder already exists Fetching all produts... https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MzEzODE1NTYzfDIwMjItMTEtMTNUMDg6MzA6MjguNzgzNzE0WnwyMDA https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MzEwMTAzNTQ0fDIwMjItMTEtMTNUMDg6MTg6MDcuODE2MTRafDQwMA https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MzIzNzAyOTEzfDIwMjItMTEtMTJUMjI6NDY6NDkuMTcyMjY4Wnw2MDA https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MzA2NDUyMTg4fDIwMjItMTEtMTJUMjI6MzI6MTEuNjUzMzYxWnw4MDA https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MzE2Njc3ODU4fDIwMjItMTEtMDZUMTI6MTQ6MjYuMDE5NjE2WnwxMDAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MzE2MjYxMDYwfDIwMjItMDktMThUMTc6NTE6MzcuMDAzMzAxWnwxMjAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=Mjg3NjQ1MzMyfDIwMjItMDctMjZUMTY6MTE6MTYuOTEyMTg4WnwxNDAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MjkzODczNjQyfDIwMjItMDUtMDVUMDg6MzQ6MDAuNTg4NTcxWnwxNjAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MjcxMzczMDgwfDIwMjItMDMtMDFUMjI6NTI6NTcuNzE1MzQxWnwxODAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MjczMzA4ODc4fDIwMjEtMTItMTdUMTU6NDI6NDguMjEzNTkyWnwyMDAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MjUwMDM2NTU2fDIwMjEtMDktMDJUMjM6MTg6MzMuNDI5MDc5WnwyMjAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MjIwNjMxMjYxfDIwMjEtMDQtMTJUMDk6MDM6MTUuMTYxMDM4WnwyNDAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MjA3MjA3MTI4fDIwMjAtMTItMjlUMjM6Mjk6MjkuMzY5NDkzWnwyNjAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MTYxOTAxNzQ4fDIwMjAtMTAtMThUMjI6MzI6MDcuNTc2ODk4WnwyODAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MTgwMTg4NDExfDIwMjAtMDktMDdUMTM6NDY6NTUuNzU0ODI0WnwzMDAw https://webapi.depop.com/api/v1/shop/18244865/products/?limit=200&offset_id=MTU2OTIxNTQwfDIwMjAtMDYtMThUMTc6NDg6NDEuMTQ0MzQ5WnwzMjAw Got all products. Start Downloading... 3289 Creation of the directory downloads/coose/ failed or the folder already exists Traceback (most recent call last): File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 174, in _new_conn conn = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\connection.py", line 95, in create_connection raise err File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\connection.py", line 85, in create_connection sock.connect(sa) TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 1042, in _validate_conn conn.connect() File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 358, in connect self.sock = conn = self._new_conn() ^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 179, in _new_conn raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x0000024C99675950>, 'Connection to webapi.depop.com timed out. (connect timeout=None)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='webapi.depop.com', port=443): Max retries exceeded with url: /api/v2/product/coose_g-black-vintage-adidas-joggers-black (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000024C99675950>, 'Connection to webapi.depop.com timed out. (connect timeout=None)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\harve\Downloads\Vinted-Scraper-main\Vinted-Scraper-main\scraper.py", line 422, in download_depop_data(userids) File "C:\Users\harve\Downloads\Vinted-Scraper-main\Vinted-Scraper-main\scraper.py", line 371, in download_depop_data product_data = requests.get(url).json() ^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 701, in send r = adapter.send(request, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\adapters.py", line 553, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='webapi.depop.com', port=443): Max retries exceeded with url: /api/v2/product/coose_g-black-vintage-adidas-joggers-black (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x0000024C99675950>, 'Connection to webapi.depop.com timed out. (connect timeout=None)'))

If you could help with this issue that would be great! It seems to be okay when pulling information from smaller stores but not so much with larger ones.

Also, would it be possible to only pull the item data and not the images, hopefully speeding up the download? And potentially being able to pull upload/sold dates?

I have limited experience with python so any help would be much appreciated. Thank you!

harveycastree commented 2 years ago

The tool seems to not be pulling sizes on 'PURCHASED' or 'MARKED AS SOLD' items also, I am not sure if this is something that can be fixed also? Thank you

Gertje823 commented 2 years ago

First of all, thank you for opening this issue. I am unable to reproduce the timeouts you are getting. Is it possible you already send a lot of requests to depop and are therefore being rate limited?

Great idea to add the option to scrape the data only whithout downloading the files. I will implement it soon 😄

About the issue with not pulling the sizes for sold items. I think depop removes the size from the product when it is marked as sold. I looked at a couple sold products, but none of them had the size available. The size is also not displayed on the webpage. If there is a sold product with size, and it is not getting parsed by the tool, could you please send me the url?

harveycastree commented 2 years ago

Hi, thanks for getting back to me so soon!

That's fine, I seem to have it working again now from downloading a fresh Python file, I didn't think I changed anything but it seems I have haha.

And yes, you are right, my mistake, I wasn't aware Depop removes the the sizes after an item is sold.

So I guess this has turned into an enhancement request now... 😄

It would be great to be able to pull some of the more readily available data from the api such as "address", "discountedPriceAmount", "dateUpdated" as well as potentially using "group" and "productType" instead of "categoryId" for categorising the items?

This should enable some good trend analysis of items with location and calendar data.

I have had a go at trying to implement some of these myself but as mentioned before, I am pretty limited when it comes to Python and there's some mistakes I'm making that are not letting me make somewhat seemingly simple changes.

Many thanks!

Gertje823 commented 2 years ago

You can now disable file download using -n.
I also added the address, discountedPriceAmount and dateUpdated fields to the database. And changed categoryId to group and subcatagory to productType