OCHA-DAP / hdx-python-api

Python API for interacting with the HDX Data Portal
http://data.humdata.org
MIT License
80 stars 16 forks source link

Is it possible to push datasets using `hdx_site = 'test'`? #20

Closed german1608 closed 4 years ago

german1608 commented 4 years ago

I'm integrating HDX to an application to publish datasets regularly, but I'd like to avoid pushing datasets to prod site when other developers work on their local machines.

Is it possible? I tried with the following script:

from hdx.utilities.easy_logging import setup_logging
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset
from hdx.data.resource import Resource

import datetime

setup_logging()

Configuration.create(hdx_site='stage', user_agent='A_Quick_Example', hdx_key='hdx key from my prod site')

dataset = Dataset({
    'name': 'demo dataset',
    'private': True,
    'title': 'demo dataset title',
    'notes': 'demo dataset notes',
    'license_id': 'ODC-ODbL',
    'methodology': 'Registry',
    'data_update_frequency': 'Every day',
    'dataset_date': datetime.datetime.now().strftime('%Y-%m-%d'),
    'dataset_source': 'Angostura',
})

dataset.set_maintainer('196196be-6037-4488-8b71-d786adf4c081') # An user that updated the ucdp-data-for-australia dataset from https://stage.data-humdata-org.ahconu.org/
dataset.set_organization('hdx') # after digging for some organization from https://stage.data-humdata-org.ahconu.org/
dataset.add_country_location('VEN')
dataset.add_tag('americas')

resource = Resource({
    'name': 'test',
    'description': 'description',
    'format': 'CSV'
})
resource.set_file_to_upload('sample.csv')
dataset.add_update_resource(resource)

dataset.create_in_hdx()

but that give me the following error:

Traceback (most recent call last):
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 316, in _write_to_hdx
    return self.configuration.call_remoteckan(self.actions()[action], data, files=files)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/hdx_configuration.py", line 307, in call_remoteckan
    return self.remoteckan().call_action(*args, **kwargs)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/ckanapi/remoteckan.py", line 87, in call_action
    return reverse_apicontroller_action(url, status, response)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/ckanapi/common.py", line 128, in reverse_apicontroller_action
    raise NotAuthorized(err)
ckanapi.errors.NotAuthorized: {'message': 'Access denied: <function package_create at 0x7f5b257bbb90> requires an authenticated user', '__type': 'Authorization Error'}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/dataset.py", line 565, in create_in_hdx
    self._save_to_hdx('create', 'name', force_active=True)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 343, in _save_to_hdx
    result = self._write_to_hdx(action, self.data, id_field_name, file_to_upload)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/data/hdxobject.py", line 322, in _write_to_hdx
    raisefrom(HDXError, 'Failed when trying to %s%s! (POST)' % (action, idstr), e)
  File "/home/german/Documents/c4v/airflow-jobs/env/lib/python3.6/site-packages/hdx/utilities/__init__.py", line 28, in raisefrom
    six.raise_from(exc_type(message), exc)
  File "<string>", line 3, in raise_from
hdx.data.hdxobject.HDXError: Failed when trying to create demo dataset! (POST)
mcarans commented 4 years ago

Apologies for not getting back on this earlier. I only just saw the issue. To publish any datasets on HDX requires that a user be supplied by way of the hdx key.

To avoid writing to prod, you can either default to stage as you have done in your code and then have the option to override for the real prod runs - so your developers only write to stage. If you don't want them to write datasets at all, my advice would be to make the default that there is no publishing to HDX - that's the version other developers would use - and then for prod runs, have configuration to override the default to allow publishing.

For example, if no HDX key is supplied in configuration, you could set hdx_read_only=True (assuming your developers need to read but not write to HDX) and disable the part of your code that publishes datasets.

german1608 commented 4 years ago

Thanks for your answers. Is there a way to create users in stage?

mcarans commented 4 years ago

Hi @german1608, I tried contacting you directly by email on this.

german1608 commented 4 years ago

@mcarans Thanks for your help. I solved the issue using hdx_site='feature', as you suggested via email. I'll close this issue then