OCHA-DAP / hdx-python-api

Python API for interacting with the HDX Data Portal
http://data.humdata.org
MIT License
80 stars 16 forks source link

Push data to HDX #19

Closed ljpetrova closed 4 years ago

ljpetrova commented 4 years ago

I have obtained hdx api key and I'm trying to implement programmatically pushing ckan dataset to HDX.

from hdx.utilities.easy_logging import setup_logging
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset

conf = Configuration(
        hdx_site='prod',
        user_agent='admin',
        hdx_key=hdx_api_key
        )
dataset_class_object = Dataset(initial_data=data, configuration=conf)
resources = dataset_class_object.get_resources()
dataset_class_object.check_required_fields(['dataset_source', 'maintainer', 'dataset_date', 'data_update_frequency', 'groups', 'methodology'])
dataset_class_object.create_in_hdx()

ERROR [ckan.views.api] Field dataset_source is missing in dataset! Traceback (most recent call last): File "/home/ljupka/ckan_custom/lib/default/src/ckan/ckan/views/api.py", line 288, in action result = function(context, request_data) File "/home/ljupka/ckan_custom/lib/default/src/ckan/ckan/logic/init.py", line 464, in wrapped result = _action(context, data_dict, **kw) File "/home/ljupka/ckan_custom/lib/default/src/ckanext-custom/ckanext/custom/logic/action/get.py", line 1837, in push_dataset_to_hdx dataset_class_object.create_in_hdx() File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/dataset.py", line 512, in create_in_hdx self.check_required_fields(allow_no_resources=allow_no_resources) File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/dataset.py", line 352, in check_required_fields self._check_required_fields('dataset', ignore_fields) File "/usr/lib/ckan_custom/default/local/lib/python2.7/site-packages/hdx/data/hdxobject.py", line 208, in _check_required_fields raise HDXError('Field %s is missing in %s!' % (field, object_type)) HDXError: Field dataset_source is missing in dataset!

mcarans commented 4 years ago

You need to supply the field dataset_source in the Dataset object. It refers to the source of the dataset and is free text. I recommend you try uploading first to our test server (choose hdx_site='test').

ljpetrova commented 4 years ago

I have resolved this with:

Configuration.delete()
Configuration.create(
        hdx_site='test',
        user_agent='admin',
        hdx_key=hdx_api_key
        )
dataset_class_object = Dataset(initial_data=data)
dataset_class_object.check_required_fields(ignore_fields=['dataset_source', 'maintainer', 'dataset_date', 'data_update_frequency', 'groups', 'methodology'], allow_no_resources=True)
dataset_class_object.create_in_hdx(ignore_check=False)