ckan / datapusher

A standalone web service that pushes data files from a CKAN site resources into its DataStore
GNU Affero General Public License v3.0
77 stars 153 forks source link

lazy-apps=true is more important than docs seem to suggest #230

Open jbothma opened 3 years ago

jbothma commented 3 years ago

version: ubuntu package 2.9.3-py3-focal1

The docs seem to suggest that setting lazy-apps=true is needed for high availability.

We were seeing absolutely no movement on jobs except when reloading status pages. Sometimes we were also seeing the error below.

Chunks would seem to be uploaded roughly around when we reload the status page or submit new jobs, suggesting http requests to pusher resolved some lock, but only for one chunk to progress.

setting lazy-apps=true fixed it.

The error we sometimes got:

pid: 239964|app: 0|req: 2/2] 127.0.0.1 () {36 vars in 435 bytes} [Thu Aug 19 20:16:12 2021] POST /job => generated 759 bytes in 25 msecs (HTTP/1.1 200) 2 headers in 72 bytes (2 switches on core 1)
Fetching from: http://example.org/dataset/17c5b499-d4ac-4551-a106-0a61b6045ac7/resource/3f5e0eaa-0f53-438a-83e2-a1271c66b445/download/finpos_2020q4_acrmun.csv
Error notifying listener
Traceback (most recent call last):
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/apscheduler/scheduler.py", line 512, in _run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 432, in push_to_datastore
    existing = datastore_resource_exists(resource_id, api_key, ckan_url)
  File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 228, in datastore_resource_exists
    raise HTTPError(
datapusher.jobs.HTTPError: <unprintable HTTPError object>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/apscheduler/scheduler.py", line 239, in _notify_listeners
    cb(event)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/ckanserviceprovider/web.py", line 189, in job_listener
    db.mark_job_as_errored(job_id, error_object)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/ckanserviceprovider/db.py", line 413, in mark_job_as_errored
    _update_job(job_id, update_dict)
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/ckanserviceprovider/db.py", line 348, in _update_job
    job_dict["error"] = json.dumps(job_dict["error"])
  File "/usr/lib/python3.8/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Response is not JSON serializable
Job "push_to_datastore (trigger: RunTriggerNow, run = True, next run at: None)" raised an exception
Traceback (most recent call last):
  File "/usr/lib/ckan/datapusher/lib/python3.8/site-packages/apscheduler/scheduler.py", line 512, in _run_job
    retval = job.func(*job.args, **job.kwargs)
  File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 432, in push_to_datastore
    existing = datastore_resource_exists(resource_id, api_key, ckan_url)
  File "/usr/lib/ckan/datapusher/src/datapusher/datapusher/jobs.py", line 228, in datastore_resource_exists
    raise HTTPError(
datapusher.jobs.HTTPError: <unprintable HTTPError object>