GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
634 stars 100 forks source link

urllib parse error found in arcgis harvest source #3847

Open FuhuXia opened 2 years ago

FuhuXia commented 2 years ago

No arcgis source can be harvested on current 2.8 FCS catalog. Have not confirmed in 2.8 cloud.gov catalog app.

When start a harvest job on arcgis source, an error is generated in gather process, then no records can be harvested.

Traceback (most recent call last):
  File "/usr/bin/ckan", line 45, in <module>
    load_entry_point('PasteScript', 'console_scripts', 'paster')()
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 245, in command
    utils.gather_consumer()
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/utils.py", line 340, in gather_consumer
    gather_callback(consumer, method, header, body)
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/queue.py", line 374, in gather_callback
    harvest_object_ids = gather_stage(harvester, job)
  File "/usr/lib/ckan-new/src/ckanext-harvest/ckanext/harvest/queue.py", line 432, in gather_stage
    harvest_object_ids = harvester.gather_stage(job)
  File "/usr/lib/ckan-new/src/ckanext-geodatagov/ckanext/geodatagov/harvesters/arcgis.py", line 157, in gather_stage
    url = urllib.parse.urljoin(source_url, search_path)
  File "/usr/lib/ckan/lib/python2.7/site-packages/future/backports/urllib/parse.py", line 418, in urljoin
    base, url, _coerce_result = _coerce_args(base, url)
  File "/usr/lib/ckan/lib/python2.7/site-packages/future/backports/urllib/parse.py", line 115, in _coerce_args
    raise TypeError("Cannot mix str and non-str arguments")
TypeError: Cannot mix str and non-str arguments

How to reproduce

Start a new harvest job of any arcgis source.

Expected behavior

Successful harvesting.

Actual behavior

Error message in gather log, as shown above.

Sketch

[Notes or a checklist reflecting our understanding of the selected approach]

FuhuXia commented 1 year ago

This is resolved in CKAN 2.9 with Python 3.8.