DataONEorg / slinky

Slinky, the DataONE Graph Store
Apache License 2.0
4 stars 4 forks source link

Error using default query filter #62

Open ThomasThelen opened 2 years ago

ThomasThelen commented 2 years ago

When using the default and dataone query filters I noticed an error in the default worker pod, pasted below. The default is currently an empty dict. I think we should have the default filter query across all DataONE repositories because this will be the default that Slinky will use in production.

thomas deploy % kubectl logs worker-default-647f4597db-lvtsl
22:00:41 Worker rq:worker:5e4a337da1b74aa5b96aaa2f18bec9ee: started, version 1.10.1
22:00:41 Subscribing to channel rq:pubsub:5e4a337da1b74aa5b96aaa2f18bec9ee
22:00:41 *** Listening on default...
22:00:41 Cleaning registries for queue: default
22:00:43 default: d1lod.jobs.update_job() (3b1325d2-4e4f-4550-b25d-0dc5ff96d716)
22:03:43 Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/rq/worker.py", line 1061, in perform_job
    rv = job.perform()
  File "/usr/local/lib/python3.9/dist-packages/rq/job.py", line 821, in perform
    self._result = self._execute()
  File "/usr/local/lib/python3.9/dist-packages/rq/job.py", line 844, in _execute
    result = self.func(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/dist-packages/d1lod/jobs.py", line 36, in update_job
    datasets = client.get_new_datasets_since(cursor, BATCH_SIZE)
  File "/usr/local/lib/python3.9/dist-packages/d1lod/client.py", line 91, in get_new_datasets_since
    return self.d1client.query(
  File "/usr/local/lib/python3.9/dist-packages/d1lod/filtered_d1_client.py", line 88, in query
    response = super().query(engine, "?" + urlencode(query_params))
  File "/usr/local/lib/python3.9/dist-packages/d1_client/baseclient_1_1.py", line 101, in query
    response = self.queryResponse(
  File "/usr/local/lib/python3.9/dist-packages/d1_client/baseclient_1_1.py", line 80, in queryResponse
    return (self.POST if do_post else self.GET)(
  File "/usr/local/lib/python3.9/dist-packages/d1_client/session.py", line 240, in GET
    return self._request("GET", rest_path_list, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/d1_client/session.py", line 369, in _request
    return self._session.request(method, url, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/requests/adapters.py", line 440, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.9/dist-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.9/http/client.py", line 1347, in getresponse
    response.begin()
  File "/usr/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
  File "/usr/local/lib/python3.9/dist-packages/rq/timeouts.py", line 63, in handle_death_penalty
    raise self._exception('Task exceeded maximum timeout value '
rq.timeouts.JobTimeoutException: Task exceeded maximum timeout value (180 seconds)
amoeba commented 2 years ago

Still need to fix this bug (weird failure when filter is not defined) and also change the default. We talked as a team and decided the best default would be something that wouldn't hammer DataONE if a user didn't change such as defaulting to all of DataONE. I suggest maybe making the default just recent datasets (last week, last month, etc).