Gertje823 / Vinted-Scraper

This is a tool to scrape/download images and data from Vinted & Depop using the API and stores the data in a SQLite database.
GNU General Public License v3.0
91 stars 21 forks source link

Timeout Error #18

Open harveycastree opened 2 years ago

harveycastree commented 2 years ago

Hi,

I still receive a timeout error when downloading data from Depop, this happens after downloading data from about 200 items. However, I believe it is something to do with my network as I only seem to get the error on my wifi. When I use a 4g hotspot from my phone the program doesn't timeout.

However, in light of not using up all my 4G, I was wondering if you could try solve the issue? Either by being able to wait longer for a response to avoid a timeout? Or if the program does timeout, to be able to restart it where it left off? Rather than having to start the download of a store from the start.

Thank you.

Gertje823 commented 2 years ago

Just downloaded an account with 8296 products without problems. I can't reproduce the timeouts you are getting.

I added an option to start from a specific item. For example: python3 scraper.py -d -n -b "coose-navy-lee-sweatshirt-amazing-lee"

harveycastree commented 2 years ago

Hi, thanks for looking into this for me. That is interesting, I have no idea why I seem to be getting the error. But after trying on a few different networks it seems to be an issue with mine. I will paste the error however if you are interested, or it is something obvious.

File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 174, in _new_conn conn = connection.create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\connection.py", line 95, in create_connection raise err File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\connection.py", line 85, in create_connection sock.connect(sa) TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen httplib_response = self._make_request( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request self._validate_conn(conn) File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 1042, in _validate_conn conn.connect() File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 358, in connect self.sock = conn = self._new_conn() ^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connection.py", line 179, in _new_conn raise ConnectTimeoutError( urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x000002E49EB66350>, 'Connection to webapi.depop.com timed out. (connect timeout=None)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\adapters.py", line 489, in send resp = conn.urlopen( ^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\connectionpool.py", line 787, in urlopen retries = retries.increment( ^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\urllib3\util\retry.py", line 592, in increment raise MaxRetryError(_pool, url, error or ResponseError(cause)) urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='webapi.depop.com', port=443): Max retries exceeded with url: /api/v2/product/rewindattire-converse-hoodie-good-retro (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002E49EB66350>, 'Connection to webapi.depop.com timed out. (connect timeout=None)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\harve\Desktop\Vinted Scraper\Vinted-Scraper-main - 3\Vinted-Scraper-main\scraper.py", line 585, in download_depop_data(userids) File "C:\Users\harve\Desktop\Vinted Scraper\Vinted-Scraper-main - 3\Vinted-Scraper-main\scraper.py", line 412, in download_depop_data product_data = requests.get(url) ^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 73, in get return request("get", url, params=params, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\api.py", line 59, in request return session.request(method=method, url=url, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, send_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\sessions.py", line 701, in send r = adapter.send(request, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\harve\AppData\Local\Programs\Python\Python311\Lib\site-packages\requests\adapters.py", line 553, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='webapi.depop.com', port=443): Max retries exceeded with url: /api/v2/product/rewindattire-converse-hoodie-good-retro (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000002E49EB66350>, 'Connection to webapi.depop.com timed out. (connect timeout=None)'))

For now though the new feature should help. Thank you.

harveycastree commented 1 year ago

Hi,

Thought I would update to help close this issue.

Spent a lot of time looking into the timeout issue with using requests and think the main culprit is simply down to the network I'm on...

Unfortunately I don't know why the error occurs, but any other solutions I've found don't seem to work.

I've tried using different networks and everything works perfectly fine.

If possible, could you add an argument to auto-restart where it left off if the program stops? Rather than this being a manual process?