Open veenstrajelmer opened 6 months ago
I think I'm getting the same error here, still on 0.4.0 though. Here's my traceback:
Traceback (most recent call last):
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
raise err
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 791, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 492, in _make_request
raise new_e
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 468, in _make_request
self._validate_conn(conn)
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1097, in _validate_conn
conn.connect()
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connection.py", line 611, in connect
self.sock = sock = self._new_conn()
^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connection.py", line 212, in _new_conn
raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7ffa8455ab10>, 'Connection to waterwebservices.rijkswaterstaat.nl timed out. (connect timeout=None)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/connectionpool.py", line 845, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='waterwebservices.rijkswaterstaat.nl', port=443): Max retries exceeded with url: /ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7ffa8455ab10>, 'Connection to waterwebservices.rijkswaterstaat.nl timed out. (connect timeout=None)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "~/rws/rwsload.py", line 454, in <module>
dsn=sentry_dsn,
^^^^^^
File "~/rws/rwsload.py", line 436, in main
insertion_status = ReportsInsertionService.process_report(session=session, reports_data=result)
^^^^^^^^^^^^^^^^^^^^^^
File "~/rws/rwsload.py", line 115, in fetch_data
except JSONDecodeError:
^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/ddlpy/ddlpy.py", line 357, in measurements
measurement = _measurements_slice(
^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/ddlpy/ddlpy.py", line 301, in _measurements_slice
resp = requests.post(endpoint["url"], json=request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspace/virtualenvs/weatherdata/lib/python3.11/site-packages/requests/adapters.py", line 507, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='waterwebservices.rijkswaterstaat.nl', port=443): Max retries exceeded with url: /ONLINEWAARNEMINGENSERVICES_DBO/OphalenWaarnemingen (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7ffa8455ab10>, 'Connection to waterwebservices.rijkswaterstaat.nl timed out. (connect timeout=None)'))
@Weidav that could be the case, fixing this issue would prevent your process from being interrupted if there is a single timeout. There could of course also be a outage of the rijkswaterstaat server, in which case the process will fail either way. However, it is difficult to fix this problem, since we have no way to simulate a single timeout on the server side, so it is difficult to debug. This is also a nice to have feature, not as essential as the recently implemented developments. If you run into this issue again, please include a minimal example code to reproduce it, if it can be reproduced at least.
This keeps happening on a regular basis.
I use the selected_stations.csv
to store the result from ddlpy.locations()
because that endpoint causes issues on a regular basis. This approach with the is more stable an allows my to directly fetch the measurements. I tried to update the csv-file, I thought maybe the stations and their available parameters changed, but that didn't help.
Here's a little snipped from my code:
EDIT: updated the csv again and I'm that leads to fewer exeptions with mesurements, I'll keep you updated.
selected_stations = pandas.read_csv("selected_stations.csv", index_col=0)
# measurements-timezone is always in utc+1
one_h_ago = datetime.utcnow() - timedelta(hours=2.1)
tomorrow = datetime.utcnow() + timedelta(days=1, hours=1)
# iterate over my known spots
for rws_id, spot_id in spots_dict.items():
try:
station = selected_stations.loc[rws_id]
except KeyError:
logger.info(f"spot-id: {spot_id} source_station-id: {rws_id} has no measurements")
continue
# when a station has only one entry, it is usually incomplete and stored as a series
if type(station) is pandas.core.series.Series:
logger.debug(f"{spot_id} measurements are incomplete and will be ignored")
i = 0
# iterate over the the different measurement-types (wind, waves...) from this station
for index, station_data in station.iterrows():
try:
measurements = ddlpy.measurements(
station_data, start_date=one_h_ago, end_date=tomorrow
)
except JSONDecodeError:
continue
[...]
Update: I keep running into the same issues, even with up to date locations / csv-file.
Could you provide example code to reproduce the issue without any of your own files or local code? So a minimal code only requiring ddlpy and its dependencies.
Description
Sometimes in the middle of data retrieval, the connection is aborted from the server side. This is an error that cannot be reproduced (and forgot to copy the traceback), but very inconvenient since it interrupts the download process.
Suggestion
Add
max_retries
parameter forrequests
to improve robustness of ddlpy.