ckan / ckanapi

A command line interface and Python module for accessing the CKAN Action API
Other
176 stars 74 forks source link

Fix some errors when executing dumps #210

Open pdelboca opened 7 months ago

pdelboca commented 7 months ago

Hello!

I'm trying to do a dump of an instance but the package is throwing some errors. This PR is to fix whatever is appearing.

Problems when logging errors

TODO: See #209

KeyError: 'format'

Traceback (most recent call last):
  File "/home/pdelboca/Repos/ckanapi/.venv/bin/ckanapi", line 33, in <module>
    sys.exit(load_entry_point('ckanapi', 'console_scripts', 'ckanapi')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/cli/main.py", line 156, in main
    return dump_things(ckan, thing[0], arguments)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/cli/dump.py", line 110, in dump_things
    create_datapackage(record, datapackages_path, stderr, apikey)
  File "/home/pdelboca/Repos/ckanapi/ckanapi/datapackage.py", line 67, in create_datapackage
    filename = resource_filename(dres)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pdelboca/Repos/ckanapi/ckanapi/datapackage.py", line 87, in resource_filename
    ext = slugify.slugify(dres['format'])
                          ~~~~^^^^^^^^^^
KeyError: 'format'
pdelboca commented 7 months ago

@wardi have you ever used ckanapi to do a dump of a portal? I'm trying to do a dump of https://datos.gob.ar/ but it is extremely slow and it also gets "blocked" after 250 datasets. (Blocked = doesnt write any output, no progress, nothing is happening)

I'm trying to do: ckanapi dump datasets --all --datapackages=./output_directory/ -r https://datos.gob.ar

wardi commented 7 months ago

@pdelboca we use it daily to create a history of our metadata for ~30k datasets. It's possible you're being throttled on the server side. dump datasets makes a separate package_show query for every dataset, you could try using search datasets instead that paginates over package_search instead for fewer requests.

It's possible to resume an interrupted load but not the dump command at the moment, maybe that's needed if you are being throttled.